implement Power Law sampling #17927

ddh0 · 2025-12-11T04:23:13Z

This PR implements a new sampler that reshapes token probability distributions to favor tokens near a configurable target probability, rather than selecting from the highest-probability candidates. The technique is called Power Law sampling and it was originally described and implemented by @MrJackSpade here.

Theory

Traditional samplers operate on a simple principle: select from the most probable tokens. Power Law sampling takes a fundamentally different approach: select tokens whose probability falls near a configurable target value.

This treats probability space as navigable terrain, allowing you to intentionally sample from specific regions of the model's probability distribution rather than always defaulting to the top candidates.

How it works

Compute original softmax probabilities
Calculate the target probability (optionally adaptive based on recent history)
Reshape the distribution using a power law transform that peaks at the target
Sample from the reshaped distribution
Record the original probability of the selected token for adaptive targeting

The power law transform assigns new logits based on distance from the target:

new_logit = 3.0 / (1 + (|p - target| / 0.2)^3)

Tokens near the target get high logits; tokens far from it get low logits.

Advantages

The sampler is designed to promote "mid-range" tokens - ones the model considers plausible but not dominant. This can help with:

Reducing repetitive, predictable outputs
Exploring creative alternatives that are still coherent
Maintaining variety over long generations via adaptive targeting

For example, with target=0.10, tokens in the 5-15% probability range get boosted, while the dominant 60%+ tokens get suppressed. The model still respects its own confidence structure (unlike pure temperature scaling), so you avoid boosting actual nonsense.

Parameters

--power-law-target (float, 0.0-1.0, default 0.5): The probability value to favor.
--power-law-target-range (float, default 0.5): Adaptive range around the target. The actual target can shift within target ± range based on history. Set to 0.0 for a fixed target. The range is clamped internally to [0.0, 1.0].
--power-law-window-size (int, default 10): Rolling window size for adaptive targeting. When > 0, the sampler tracks the original probabilities of recent selections and nudges the target to maintain the desired average. Set to 0 for a fixed target. Default 10 is a good value.

Usage

This sampler selects a token rather than just filtering candidates, like greedy, dist, or mirostat. It must be the final sampler in the chain. Light filtering beforehand (like a mild min-p) can help remove garbage tokens.

This sampler is intentionally not part of the default sampler chain. To enable it, add power_law (or power-law) to your sampler chain, e.g. with --samplers "top_k;min_p;power_law".

ddh0 · 2025-12-11T23:59:11Z

I think this is more or less ready for review now. Also pinging @MrJackSpade in case he'd like to chime in.

ddh0 · 2025-12-12T02:55:30Z

Nevermind, sorry, I think we want to do a little more testing. I'm going to mark this as draft again temporarily.

pnb

This looks very interesting! I wish the original compared to XTC, since the goals seem highly similar.

As an aside, I am curious if there is some way to make it work without selecting a token (i.e., only steps 1-3). I see why token selection is necessary, given the need to save the original probability to the history for the adaptive adjustment part. But, for example, maybe it would suffice instead to save the original probability of the highest-probability token after transforming, regardless of which one is eventually selected by a downstream sampler.

pnb · 2025-12-12T18:40:14Z

src/llama-sampling.cpp

+
+    // fixed power law transform parameters (from original implementation)
+    const float distribution_width = 0.2f;
+    const float peak_logit_value   = 3.0f;


Should these parameters be configurable like in the original implementation? There is probably a tradeoff with feature creep, having too many options for users to control, but some of these seem potentially important (especially distribution_width). Also, I noticed peak_logit_value is outside the range suggested in the original implementation; is that intentional?

Myself and the original author are discussing the parameters over the next few days, I agree that the current implementation is probably not ideal, which is why I marked it back as draft.

I will post a comment in the main thread with an update once we've got it more figured out. Thank you!

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

ddh0 added 2 commits December 10, 2025 22:13

initial commit for branch

774cf23

simplify constants

5ab4ff7

loci-dev mentioned this pull request Dec 11, 2025

UPSTREAM PR #17927: implement Power Law sampling auroralabs-loci/llama.cpp#522

Open

ddh0 and others added 11 commits December 11, 2025 12:52

Merge branch 'ggml-org:master' into power-law-sampler

66e2d17

add params to struct common_params_sampling, add reference to PR

88fb0f3

explicitly clamp min_target and max_target to [0.0, 1.0]

374bfd4

add args, rename queue_size -> window_size

ffe1639

improved comments

4959878

minor

f3457a8

remove old unused code from algorithm

9316959

minor

b3aea57

add power law case to common_sampler_init, add sampler name mappings

cd7de7c

clarify behaviour when window_size = 0

534cb4f

add missing enums

dcada03

ddh0 marked this pull request as ready for review December 11, 2025 23:59

ddh0 requested a review from ggerganov as a code owner December 11, 2025 23:59

ddh0 marked this pull request as draft December 12, 2025 02:55

ddh0 added 2 commits December 11, 2025 22:43

remove target_range param, make target == 1 no-op, cleanup code

2d62bbe

oops, straggler

5c78b79

pnb reviewed Dec 12, 2025

View reviewed changes

add missing parameters in server-task.cpp

53380c1

github-actions bot added examples server labels Dec 13, 2025

ddh0 and others added 4 commits December 12, 2025 23:19

copy from author

94cb883

ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

remove old debug log, style nit

0a19a3f

fix compiler warning, add commented-out logging per token

824bb3a

Merge branch 'ggml-org:master' into power-law-sampler

1879fc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implement Power Law sampling #17927

implement Power Law sampling #17927

ddh0 commented Dec 11, 2025 •

edited

Loading

Uh oh!

ddh0 commented Dec 11, 2025

Uh oh!

ddh0 commented Dec 12, 2025

Uh oh!

pnb left a comment

Uh oh!

pnb Dec 12, 2025

Uh oh!

ddh0 Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

implement Power Law sampling #17927

Are you sure you want to change the base?

implement Power Law sampling #17927

Conversation

ddh0 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Theory

How it works

Advantages

Parameters

Usage

Uh oh!

ddh0 commented Dec 11, 2025

Uh oh!

ddh0 commented Dec 12, 2025

Uh oh!

pnb left a comment

Choose a reason for hiding this comment

Uh oh!

pnb Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ddh0 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ddh0 commented Dec 11, 2025 •

edited

Loading