7 Magic Words That Make Your LLM 10 Smarter at Math
Source: Dev.to
🌐 Live demo (LOOK · UNDERSTAND · BUILD): https://dev48v.infy.uk/prompt/day2-chain-of-thought.html
Day 2 of my PromptFromZero series — 50 LLM techniques in 50 days, each visualized with LOOK / UNDERSTAND / BUILD.
Today: Chain of Thought (CoT). The single highest-impact prompt change you can make. Costs nothing. Adds 7 words. Often turns wrong answers into right ones.
The setup
Same problem. Same model. Two prompts.
Roger has 5 tennis balls. He buys 2 cans of 3 balls each.
How many balls does he have now?
Enter fullscreen mode
Exit fullscreen mode
Prompt A — “just answer”
…question… Just answer with the number, nothing else.
Enter fullscreen mode
Exit fullscreen mode
Small / older models often answer: 8. Wrong.
Prompt B — Chain of Thought
…same question… Let's think step by step.
Enter fullscreen mode
Exit fullscreen mode
Model writes:
Roger starts with 5 balls.
He buys 2 cans, each holding 3 balls.
2 × 3 = 6 new balls.
5 + 6 = 11.
Final answer: 11.
Enter fullscreen mode
Exit fullscreen mode
Right.
Same model. Same problem. Seven extra words on the prompt. The accuracy boost on multi-step math problems is consistently massive.
Why it works
LLMs generate one token at a time, each token conditioned on every token that came before. If you ask for the answer with no working, the model has to compress the whole computation into a single number prediction. There’s nowhere to “scratch paper”.
Chain of Thought forces the model to write the scratch paper out. Each step becomes additional context for the next step. By the time it gets to “Final answer:”, the arithmetic is already on the page — anchored to real numbers, not vibes.
More tokens spent = more compute per problem = more reasoning capacity. CoT is literally trading latency for accuracy.
When to use it
Use CoT Skip CoT
Math word problems Factual lookups (“What’s the capital of France?”)
Multi-step logical reasoning Creative writing
Cause-and-effect chains Short summaries
Subtle classifications Code completion
Heuristic: if you would write scratch-paper math yourself, the model will benefit from CoT.
Build it in 10 minutes
mkdir cot-from-zero && cd cot-from-zero
npm init -y
npm install ai @ai-sdk/google
echo "GOOGLE_GENERATIVE_AI_API_KEY=your_key_here" > .env
Enter fullscreen mode
Exit fullscreen mode
Get a free Gemini key at https://aistudio.google.com/apikey (no credit card).
// cot.mjs
import { generateText } from "ai";
import { google } from "@ai-sdk/google";
const model = google("gemini-2.5-flash");
const problem = "Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many balls does he have now?";
const bad = await generateText({
model,
prompt: problem + "\n\nJust answer with the number, nothing else."
});
const good = await generateText({
model,
prompt: problem + "\n\nLet's think step by step."
});
console.log("=== Without CoT ===\n" + bad.text);
console.log("\n=== With CoT ===\n" + good.text);
Enter fullscreen mode
Exit fullscreen mode
node --env-file=.env cot.mjs
Enter fullscreen mode
Exit fullscreen mode
Two runs of the same model on the same problem, side by side. The difference is visible immediately.
Levels of CoT
- Zero-shot CoT (above)
Just add “Let’s think step by step.” Works on most modern models.
- Few-shot CoT
Prepend 2-3 worked examples before the question:
Q: Sara had 4 apples and got 2 more. How many?
A: Sara had 4. She got 2 more. 4 + 2 = 6. Answer: 6.
Q: Roger has 5 tennis balls. He buys 2 cans of 3 each. How many balls?
A: [model continues in same format]
Enter fullscreen mode
Exit fullscreen mode
Better on harder problems — the model has explicit examples of the reasoning depth you want.
- Structured CoT
Force a format:
"Solve this. Number your steps 1, 2, 3. Final answer on a new line starting 'Answer:'."
Enter fullscreen mode
Exit fullscreen mode
Easier to parse programmatically.
- Hidden CoT
Generate the chain, then strip it before showing the user:
const reply = result.text;
const clean = reply.replace(/[\s\S]*?/g, '').trim();
Enter fullscreen mode
Exit fullscreen mode
User sees just the answer; the model gets the accuracy benefit.
What about reasoning models?
GPT-5, Claude 4 Sonnet, o1, o3, Gemini 2.5 — modern flagship models train with reasoning baked in. They don’t need “let’s think step by step.” They do it automatically.
But:
-
They cost 10× more per token
-
They’re slower (visible “thinking…” UI)
-
They’re overkill for simple tasks
Cheap model + CoT prompt ≈ reasoning model output, at ~10% of the cost. CoT is still the highest-leverage technique you can use on small models.
What this unlocks
CoT is the foundation. Every fancier reasoning technique builds on top:
Self-consistency — sample N CoT runs, take majority vote
ReAct — CoT + tool calls interleaved (Day 1)
Tree of Thoughts — branch CoT into multiple paths, evaluate
Reflection — generate, criticize own output, regenerate
Master CoT first. Everything else is variations.
Try it now
Three tabs on one page:
https://dev48v.infy.uk/prompt/day2-chain-of-thought.html
LOOK — animated side-by-side trace of both prompts
UNDERSTAND — 8 click-through steps on why CoT works
BUILD — copy the code, run it on your machine
What’s next in PromptFromZero
Day 3: Self-consistency. Sample 5 CoT runs, take majority vote. Same model, even higher accuracy.
Series: 50 LLM techniques · 50 days · Vercel AI SDK throughout.
🌐 All techniques: https://dev48v.infy.uk/promptfromzero.php
