I Built a 35-Tool MCP Server That Cut My AI Token Usage by 95%
Source: Dev.to
Context‑Heavy AI Code Exploration – The Problem
Every time I asked Claude to help me with a codebase, the same thing happened:
- The model read file after file, burning through 50 K+ tokens just to understand the project structure.
- I hit the context limit before getting any real work done.
My Solution – MCP Repo Context Server
I built an MCP (Model Context Protocol) server that analyzes a codebase once, extracts everything an AI agent needs (function behaviours, call graphs, DB queries, HTTP calls, …), and serves precise answers in 2‑4 K tokens instead of 50 K+.
How It Works
When you point an AI agent at a codebase, it has no memory. Every session starts from scratch, runs grep, reads files one by one, and builds a mental model—slowly, expensively, and incompletely.
For a medium‑sized Go project (~100 files) a typical exploration burns:
| Issue | Tokens |
|---|---|
| Understanding what functions exist | ≈ 50 K |
Multiple rounds of grep → read | – |
| Missed cross‑file relationships (call graphs) | – |
This isn’t an AI problem. It’s a context‑delivery problem.
Architecture
┌──────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Agent │────▶│ MCP Server │────▶│ Storage Layer │
│ (Claude) │◀────│ (JSON‑RPC/stdio)│◀────│ JSON + SQLite │
└──────────────┘ └──────────────────┘ └─────────────────┘
│
┌───────┼───────┐
▼ ▼ ▼
AST Parser Vector Call Graph
(Go) Search Builder
Three Layers that Make It Work
-
AST Parsing – Using Go’s
go/astpackage to extract:- Every function signature
- Step‑by‑step behaviour
- DB queries, HTTP calls, error‑handling patterns, side effects
- Wrapped errors, deferred calls, goroutine launches (real syntax‑tree traversal, not regex)
-
Semantic Vector Search – Each function/type gets a 384‑dimensional TF‑IDF embedding stored in SQLite.
- Queries like “find authentication code” match semantically similar functions.
- No external API calls; embeddings are computed locally.
-
Call‑Graph Extraction – Builds a complete call graph (direct, goroutine, deferred).
- Powers tools such as
get_callersandvisualize_call_graph(Mermaid diagrams).
- Powers tools such as
Sample Tools (35 total)
| Category | Tool | Description |
|---|---|---|
| Function Insight | get_function_context | Returns behaviour summary, execution steps, DB queries (actual SQL), HTTP calls (endpoints), error‑handling patterns, callers & callees. |
| Side‑Effect Search | search_by_side_effect | effect: "db_query" → every function touching the DB (with queries). Works for http_call, file_io, logging. |
| Concept Search | search_by_concept | concept: "authentication" → all auth‑related functions, powered by the semantic index. |
| Incremental Update | refresh_file | Re‑analyzes a single changed file in ~10 ms, updating stored context. |
| Visualization | visualize_call_graph | Generates a Mermaid flowchart of callers/callees at configurable depth. |
Token‑Usage Comparison
| Task | Agent‑Only (Claude) | MCP Server |
|---|---|---|
| Understand a function | ~50 K tokens | ~4 K tokens |
| Find related code | ~30 K tokens | ~2‑3 K tokens |
| After editing a file (full re‑explore) | ~1‑2 K tokens | – |
| Natural‑language Q&A | Not possible | ~8 K tokens |
Result: 10‑25× reduction in token usage per query → no more context‑limit hits, fast, responsive AI assistance.
Implementation Details
-
Dependencies: only
go-git(Git ops) andgo-sqlite3(vector storage).
All other features (AST parsing, HTTP handling, JSON) use the Go standard library → tiny binary, minimal supply‑chain, trivial deployment. -
Embeddings: Chose local TF‑IDF over OpenAI’s embedding API.
- Sufficient quality for code search (function names & patterns are distinctive).
- Works offline, zero latency, no API keys, no rate limits, no cost.
-
Result Pagination: Search returns compact references; each includes a
detail_refthat the AI can call to expand.- AI gets a list of ~20 matches in ~2 K tokens, then fetches full details only for the 2‑3 it actually needs.
-
Concurrency: Analyses of different repos run concurrently; operations on the same repo serialize.
- Avoids a global mutex, allowing parallel work across services.
-
Language Support: Deep AST analysis currently Go‑only. Other languages get a generic extractor (basic structure, no behaviour details).
- Next step: add Tree‑Sitter parsers for Python & TypeScript.
-
Transport: STDIO only now. MCP spec also supports HTTP/SSE, which would let the server run as a long‑lived daemon shared across multiple AI sessions.
- Presently each Claude Code session spawns its own server process.
Roadmap – Scaling to Organization‑Level Intelligence
Killer Feature: “What happens when someone hits
/login?”
The server should trace the entire flow:
- Request → Service A
LoginHandler→ Service B/auth/validate→ Kafkauser.verified→ Service CVerificationHandler.
To achieve this we need:
- HTTP Client Call Detection – Extract outbound HTTP calls, infer target services/endpoints.
- Message‑Bus (Kafka) Flow Tracking – Identify producers/consumers, map topics to handlers.
- Cross‑Service Call Graph – Merge per‑repo graphs into a global view.
- Transport Upgrade – Implement HTTP/SSE for a persistent daemon.
- Multi‑Language AST – Add Tree‑Sitter parsers for Python & TypeScript to broaden coverage.
TL;DR
- Problem: AI agents waste tens of thousands of tokens just to understand a codebase.
- Solution: Pre‑process the repo once, store rich AST‑derived metadata, and expose it via a lightweight MCP server.
- Result: 2‑4 K tokens per query, offline operation, fast incremental updates, and a clear path to organization‑wide code intelligence.
MCP Repo Context – Recent Enhancements
Overview
The MCP server is evolving to give developers richer, cross‑repo insights without needing to run the code. Below is a concise, markdown‑formatted summary of the new capabilities, tools, and usage instructions.
1. End‑to‑End Request Tracing
- Static analysis now parses destination URLs, route registrations (e.g.,
gorilla/mux), and async message producers/consumers (Kafka, RabbitMQ, NATS). - The data is matched across repositories to build a service‑to‑service flow graph.
- New tools:
trace_api_flow– traces a request from entry point to all downstream services.get_service_map– visualizes the entire service connectivity map.
Static analysis for distributed tracing works on code that isn’t deployed yet, so no OpenTelemetry instrumentation is required.
2. Full Module Dependency Parsing
- The server now reads
go.modfiles, handling:- Direct & indirect dependencies.
replacedirectives.- Import classification (stdlib vs. internal vs. external).
- Tool:
get_dependency_graph– outputs a Mermaid diagram showing how repositories depend on each other.
This forms the foundation for cross‑repo features.
3. Organization‑Wide Semantic Indexing
- Repositories can be grouped under an organization model.
- Added
search_orgtool that combines:- Keyword search.
- Vector search.
- Hybrid ranking via reciprocal rank fusion.
Example query: “find authentication code” → returns results from all 50+ repos, ranked by relevance.
4. One‑Call PR Impact Analysis
Instead of multiple tool calls, agents can now invoke a single function:
analyze_pr_impact– returns:- Changed function behaviours.
- Callers affected.
- Cross‑service impact.
- Dependency‑level impact.
- Risk assessment.
Pre‑built recipes for the three most common workflows (PR impact analysis, API flow explanation, architecture review) stay within an 8 K token budget.
5. Extensible Analyzer & Embedder
- Plugin interfaces introduced:
AnalyzerPlugin– add language support (TypeScript, Python, etc.).EmbedderPlugin– swap embedding models.
Current experiment: Voyage Code‑3 – 16 % better than OpenAI on code retrieval.
6. REST API & Multi‑Tenant Deployment
- Core tools are now wrapped as a REST API with:
- GitHub/GitLab webhook integration for auto‑analysis on push events.
- Multi‑tenant storage for organization isolation.
- Async analysis queuing.
Goal: Deploy once for an entire team instead of per‑developer.
7. Open‑Source Repository
https://github.com/yashpalsinhc/mcp-repo-context
8. Configuration for Claude Code
Add the following entry to your MCP config (JSON‑RPC over stdio):
{
"mcpServers": {
"repo-context": {
"command": "path/to/mcp-repo-context",
"args": ["--data-dir", "~/.mcp-data"]
}
}
}
9. Typical Claude Code Interaction
> Analyze my local project at /path/to/repo
> What does the CreateUser function do?
> Find all database operations
> Show me the call graph for HandleLogin
10. Takeaway
If you’re building AI‑powered developer tools, the MCP ecosystem is worth exploring:
- Simple protocol (JSON‑RPC over stdio).
- Go‑based server side is performant and easy to extend.
- Turns expensive, slow AI exploration into fast, precise queries.
I use this server every day – it’s changed how I work with AI on code.