How caching helps in LLM Application?
Source: Dev.to
What is caching?
Caching is the technique of storing frequently accessed data in a temporary, high‑speed storage (e.g., Redis). It reduces the compute load on the server for repeated requests and lowers latency.
How it helps with LLM API calls
In traditional API calls, the server may need to fetch data from a database or perform computation.
LLM API calls are billed based on tokens:
- Input tokens – tokens sent in the request.
- Output tokens – tokens generated in the response.
Because you pay per token, repeated identical queries can become expensive. Caching stores frequently asked queries and their responses, allowing you to serve the same result without incurring additional token costs.
Analogy
- User A: “What are some good places to visit in Japan?”
- User B: “I want to visit Japan; what are some good spots?”
Both prompts are semantically similar. Caching the response once lets you serve it to multiple users.
Caching not only reduces your API bill but also cuts latency. A cache hit typically takes 10–20 ms, whereas generating a fresh response can take 3–5 seconds.
Best use cases
- FAQ bots
- Retrieval‑augmented generation (RAG) with repeated queries
- Fixed prompt generation
- Educational/learning apps
- Scenarios where factual information is required
When to avoid caching
Caching is not suitable for every situation. Avoid it for:
- Legal or medical contexts
- User‑specific data
- Personalized outputs
- Real‑time data (e.g., stock prices, currency rates)
- Cases where creativity is a priority (temperature ≠ 0)
Distinguishing responses
To decide whether to cache, use a hash function that incorporates important parameters such as:
- Temperature (measure of randomness)
- Model name
- Prompt text
For multi‑turn conversations, include the full conversation history in the hash.
Cost analysis
Assume each API call costs $0.0001.
| Scenario | Calls | Total cost |
|---|---|---|
| Without caching | 100,000 same prompt | $10 |
| With caching | 100,000 same prompt | $0.0001 |
Note
Caching introduces trade‑offs. Perform a thorough analysis to decide whether to use caching, avoid it, or adopt a hybrid approach.