What Really Happens When an LLM Chooses the Next Token🤯

Published: (January 11, 2026 at 10:16 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

The Core Idea

Given a prompt, the model predicts a probability distribution over possible next tokens.

For example:

Twinkle twinkle little

At this point, the model assigns a probability to each candidate token. You can imagine them laid out on a 0–100 scale:

  • Higher probability → larger segment
  • Lower probability → smaller segment

Probability Distribution Chart

Sampling: What Actually Happens

Next comes sampling. A practical way to think about it:

  1. Generate a random number.
  2. See which segment it falls into.
  3. Output the corresponding token.

Since ā€œstarā€ has the largest segment, it’s the most likely result:

Twinkle twinkle little star

Temperature, Top‑p, and Top‑k only affect this sampling step.

From here on we’ll use the defaults:

  • Temperature = 1
  • Top‑p = 1
  • Top‑k = 10

and change one parameter at a time.

Temperature

Temperature does one thing: it stretches or flattens probability differences.

  • Lower temperature → strong preferences → stable output
  • Higher temperature → flatter distribution → more randomness

In this example the gap between ā€œstarā€ and ā€œcarā€ is 19.6.

  • With Temperature = 0.5, the gap grows to 36.1.

Temperature demo 1

  • With Temperature = 1.68, lower‑probability tokens become more competitive.

Temperature demo 2

Key point: Temperature doesn’t remove tokens; it only changes how strongly the model prefers one over another.

Top‑p (Nucleus Sampling)

Top‑p controls how much probability mass is kept. The process is straightforward:

  1. Start from the highest‑probability token.
  2. Keep adding tokens until cumulative probability ≄ Top‑p.
  3. Drop the rest.

With Top‑p = 0.6, only tokens covering 60 % of total probability remain.

Top‑p demo 1

The remaining tokens are then renormalized:

Top‑p demo 2

  • The number of tokens is dynamic.
  • More peaked distributions keep fewer tokens.

Top‑k

Top‑k is simpler: keep only the top K tokens.

  • Top‑k = 1 → always pick the most likely token.
  • Top‑k = 5 → sample from the top 5.
  • Everything else is ignored.

Top‑k demo 1

In one line:

  • Top‑k limits quantity.
  • Top‑p limits probability mass.

Demo

All visuals in this article come from the LLM Sampling Visualizer:

šŸ‘‰

If sampling parameters feel abstract, five minutes with this tool builds intuition faster than reading more text.

References

Back to Blog

Related posts

Read more Ā»

Instructions Are Not Control

!Cover image for Instructions Are Not Controlhttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-u...

AI-Radar.it

!Cover image for AI-Radar.ithttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazona...