I Cut My Claude Code Token Usage by 94% With This Open Source Tool
Source: Dev.to
The Problem
Input tokens are 85‑95% of your Claude Code bill. Every time you ask Claude about your payment flow, it reads payments.py, shipping.py, and whatever else it thinks might be relevant. That’s 45,000 tokens for a question that needs 800 tokens of context.
Without CCE: Claude reads payments.py + shipping.py = 45,000 tokens
With CCE: context_search "payment flow" = 800 tokens
How It Works
CCE runs as a local MCP server. Three lines to set up:
uv tool install code-context-engine
cd /path/to/your/project
cce init
That’s it. No cloud. No config. cce init auto‑detects your editor (Claude Code, VS Code, Cursor, Gemini CLI, Codex, OpenCode) and writes the right config.
Under the hood
- Tree‑sitter parses your code into semantic chunks (functions, classes, modules)
- Hybrid retrieval combines vector similarity with BM25 keyword matching
- Graph expansion walks CALLS/IMPORTS edges to pull in related code
- Compression reduces chunks to signatures and docstrings
- Memory persists decisions and code areas across sessions
Re‑indexing after edits takes under 1 second (96 % embedding cache hit rate). Git hooks keep the index current automatically.
The Benchmark
We benchmarked against FastAPI (53 source files, 180 K tokens) with 20 real coding questions. No cherry‑picking.
| Metric | Result |
|---|---|
| Retrieval savings | 94 % (83,681 → 4,927 tokens/query) |
| Compression (additional) | 89 % |
| Recall@10 | 0.90 |
| Latency p50 | 0.4 ms |
Important: The 94 % is measured against full‑file reads, not against Claude Code’s built‑in exploration. We use full‑file as the baseline because it’s reproducible and deterministic. Full methodology here.
You can reproduce it yourself:
pip install code-context-engine
python benchmarks/run_benchmark.py --repo https://github.com/fastapi/fastapi.git --source-dir fastapi
What You Get
Nine MCP tools that Claude uses automatically:
context_search– hybrid vector + BM25 searchsession_recallandrecord_decision– cross‑session memoryrelated_context– code graph traversalset_output_compression– control response verbosityexpand_chunk,record_code_area,index_status,reindex
A live dashboard with token savings, donut charts, and session history:
cce dashboard
Dollar estimates fetched from live Anthropic pricing:
cce savings --all
Why Not Just Use Cursor’s Built‑in Indexing?
CCE is editor‑agnostic. One index works across Claude Code, VS Code, Cursor, Gemini CLI, and Codex. Your code never leaves your machine. You also get measurable savings with actual dollar amounts, not estimates.
Languages Supported
AST‑aware chunking for Python, JavaScript, TypeScript, PHP, Go, Rust, and Java. Language‑aware fallback for 40 + more (C, C++, Swift, Kotlin, Ruby, Haskell, etc.). All text files are indexed.
Try It
uv tool install code-context-engine
cd your-project
cce init
Three lines. See your savings in 60 seconds.
CCE is MIT licensed, free, and open source. Built by Elara Labs.