What Really Happens When an LLM Chooses the Next Tokenš¤Æ
Source: Dev.to
The Core Idea
Given a prompt, the model predicts a probability distribution over possible next tokens.
For example:
Twinkle twinkle little
At this point, the model assigns a probability to each candidate token. You can imagine them laid out on a 0ā100 scale:
- Higher probability ā larger segment
- Lower probability ā smaller segment

Sampling: What Actually Happens
Next comes sampling. A practical way to think about it:
- Generate a random number.
- See which segment it falls into.
- Output the corresponding token.
Since āstarā has the largest segment, itās the most likely result:
Twinkle twinkle little star
Temperature, Topāp, and Topāk only affect this sampling step.
From here on weāll use the defaults:
- Temperature = 1
- Topāp = 1
- Topāk = 10
and change one parameter at a time.
Temperature
Temperature does one thing: it stretches or flattens probability differences.
- Lower temperature ā strong preferences ā stable output
- Higher temperature ā flatter distribution ā more randomness
In this example the gap between āstarā and ācarā is 19.6.
- With Temperature = 0.5, the gap grows to 36.1.

- With Temperature = 1.68, lowerāprobability tokens become more competitive.

Key point: Temperature doesnāt remove tokens; it only changes how strongly the model prefers one over another.
Topāp (Nucleus Sampling)
Topāp controls how much probability mass is kept. The process is straightforward:
- Start from the highestāprobability token.
- Keep adding tokens until cumulative probability ā„ Topāp.
- Drop the rest.
With Topāp = 0.6, only tokens covering 60āÆ% of total probability remain.

The remaining tokens are then renormalized:

- The number of tokens is dynamic.
- More peaked distributions keep fewer tokens.
Topāk
Topāk is simpler: keep only the topāÆK tokens.
- Topāk = 1 ā always pick the most likely token.
- Topāk = 5 ā sample from the topāÆ5.
- Everything else is ignored.

In one line:
- Topāk limits quantity.
- Topāp limits probability mass.
Demo
All visuals in this article come from the LLM Sampling Visualizer:
š
If sampling parameters feel abstract, five minutes with this tool builds intuition faster than reading more text.