Caching AI Responses in a Desktop App — Don't Pay Twice for the Same Question
Source: Dev.to

If a user closes the AI diagnosis overlay and reopens it, should you call Gemini again?
No. Cache the result. Same input → same output. No reason to burn rate‑limit quota.
The problem
Without caching
- User clicks Diagnose on error line 847.
- Gemini responds in ~3 seconds.
- User closes the overlay.
- User reopens the overlay.
- Gemini is called again → another 3 seconds and another request.
With caching
Steps 4‑5 become instant, with zero API calls.
Cache key: hash the input
The same log context produces the same hash, which maps to the same cached result.
use std::collections::HashMap;
use sha2::{Sha256, Digest};
pub struct DiagnosisCache {
entries: HashMap,
max_size: usize,
}
#[derive(Clone)]
pub struct CacheEntry {
pub result: String,
pub created_at: std::time::Instant,
}
impl DiagnosisCache {
pub fn new(max_size: usize) -> Self {
Self { entries: HashMap::new(), max_size }
}
pub fn key(context: &str) -> String {
let mut hasher = Sha256::new();
hasher.update(context.as_bytes());
format!("{:x}", hasher.finalize())
}
pub fn get(&self, key: &str) -> Option {
self.entries.get(key)
}
pub fn insert(&mut self, key: String, result: String) {
// Evict oldest entries if at capacity
if self.entries.len() >= self.max_size {
if let Some(oldest_key) = self.entries
.iter()
.min_by_key(|(_, v)| v.created_at)
.map(|(k, _)| k.clone())
{
self.entries.remove(&oldest_key);
}
}
self.entries.insert(key, CacheEntry {
result,
created_at: std::time::Instant::now(),
});
}
}
Using it in the command
#[tauri::command]
pub async fn diagnose(
context: String,
api_key: String,
cache: tauri::State>,
) -> Result> {
let key = DiagnosisCache::key(&context);
// Check cache first
{
let cache = cache.lock().unwrap();
if let Some(entry) = cache.get(&key) {
return Ok(entry.result.clone()); // instant
}
}
// Cache miss — call Gemini
let result = call_gemini(&context, &api_key).await?;
// Store result
{
let mut cache = cache.lock().unwrap();
cache.insert(key, result.clone());
}
Ok(result)
}
Cache size
A capacity of 50 entries is enough for a typical session. Log lines change constantly, so a cache entry older than a couple of hours is rarely useful. You can clear the cache on app restart or add a TTL if desired.
// Register in main.rs
.manage(std::sync::Mutex::new(DiagnosisCache::new(50)))
Result
- First diagnosis: ~3 seconds.
- Subsequent identical diagnoses: instant.
Rate‑limit usage is cut dramatically for users who re‑examine the same errors.
Links
- Hiyoko PDF Vault →
- X →