I Built a 35-Tool MCP Server That Cut My AI Token Usage by 95%

Published: 3 days ago (February 26, 2026 at 02:36 AM EST)

7 min read

Source: Dev.to

Context‑Heavy AI Code Exploration – The Problem

Every time I asked Claude to help me with a codebase, the same thing happened:

The model read file after file, burning through 50 K+ tokens just to understand the project structure.
I hit the context limit before getting any real work done.

My Solution – MCP Repo Context Server

I built an MCP (Model Context Protocol) server that analyzes a codebase once, extracts everything an AI agent needs (function behaviours, call graphs, DB queries, HTTP calls, …), and serves precise answers in 2‑4 K tokens instead of 50 K+.

How It Works

When you point an AI agent at a codebase, it has no memory. Every session starts from scratch, runs grep, reads files one by one, and builds a mental model—slowly, expensively, and incompletely.

For a medium‑sized Go project (~100 files) a typical exploration burns:

Issue	Tokens
Understanding what functions exist	≈ 50 K
Multiple rounds of `grep → read`	–
Missed cross‑file relationships (call graphs)	–

This isn’t an AI problem. It’s a context‑delivery problem.

Architecture

┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  AI Agent    │────▶│  MCP Server      │────▶│  Storage Layer  │
│  (Claude)    │◀────│  (JSON‑RPC/stdio)│◀────│  JSON + SQLite │
└──────────────┘     └──────────────────┘     └─────────────────┘
                            │
                    ┌───────┼───────┐
                    ▼       ▼       ▼
              AST Parser  Vector  Call Graph
              (Go)        Search  Builder

Three Layers that Make It Work

AST Parsing – Using Go’s go/ast package to extract:
- Every function signature
- Step‑by‑step behaviour
- DB queries, HTTP calls, error‑handling patterns, side effects
- Wrapped errors, deferred calls, goroutine launches (real syntax‑tree traversal, not regex)
Semantic Vector Search – Each function/type gets a 384‑dimensional TF‑IDF embedding stored in SQLite.
- Queries like “find authentication code” match semantically similar functions.
- No external API calls; embeddings are computed locally.
Call‑Graph Extraction – Builds a complete call graph (direct, goroutine, deferred).
- Powers tools such as get_callers and visualize_call_graph (Mermaid diagrams).

Sample Tools (35 total)

Category	Tool	Description
Function Insight	`get_function_context`	Returns behaviour summary, execution steps, DB queries (actual SQL), HTTP calls (endpoints), error‑handling patterns, callers & callees.
Side‑Effect Search	`search_by_side_effect`	`effect: "db_query"` → every function touching the DB (with queries). Works for `http_call`, `file_io`, `logging`.
Concept Search	`search_by_concept`	`concept: "authentication"` → all auth‑related functions, powered by the semantic index.
Incremental Update	`refresh_file`	Re‑analyzes a single changed file in ~10 ms, updating stored context.
Visualization	`visualize_call_graph`	Generates a Mermaid flowchart of callers/callees at configurable depth.

Token‑Usage Comparison

Task	Agent‑Only (Claude)	MCP Server
Understand a function	~50 K tokens	~4 K tokens
Find related code	~30 K tokens	~2‑3 K tokens
After editing a file (full re‑explore)	~1‑2 K tokens	–
Natural‑language Q&A	Not possible	~8 K tokens

Result: 10‑25× reduction in token usage per query → no more context‑limit hits, fast, responsive AI assistance.

Implementation Details

Dependencies: only go-git (Git ops) and go-sqlite3 (vector storage).
All other features (AST parsing, HTTP handling, JSON) use the Go standard library → tiny binary, minimal supply‑chain, trivial deployment.
Embeddings: Chose local TF‑IDF over OpenAI’s embedding API.
- Sufficient quality for code search (function names & patterns are distinctive).
- Works offline, zero latency, no API keys, no rate limits, no cost.
Result Pagination: Search returns compact references; each includes a detail_ref that the AI can call to expand.
- AI gets a list of ~20 matches in ~2 K tokens, then fetches full details only for the 2‑3 it actually needs.
Concurrency: Analyses of different repos run concurrently; operations on the same repo serialize.
- Avoids a global mutex, allowing parallel work across services.
Language Support: Deep AST analysis currently Go‑only. Other languages get a generic extractor (basic structure, no behaviour details).
- Next step: add Tree‑Sitter parsers for Python & TypeScript.
Transport: STDIO only now. MCP spec also supports HTTP/SSE, which would let the server run as a long‑lived daemon shared across multiple AI sessions.
- Presently each Claude Code session spawns its own server process.

Roadmap – Scaling to Organization‑Level Intelligence

Killer Feature: “What happens when someone hits /login?”
The server should trace the entire flow:

Request → Service A LoginHandler → Service B /auth/validate → Kafka user.verified → Service C VerificationHandler.

To achieve this we need:

HTTP Client Call Detection – Extract outbound HTTP calls, infer target services/endpoints.
Message‑Bus (Kafka) Flow Tracking – Identify producers/consumers, map topics to handlers.
Cross‑Service Call Graph – Merge per‑repo graphs into a global view.
Transport Upgrade – Implement HTTP/SSE for a persistent daemon.
Multi‑Language AST – Add Tree‑Sitter parsers for Python & TypeScript to broaden coverage.

TL;DR

Problem: AI agents waste tens of thousands of tokens just to understand a codebase.
Solution: Pre‑process the repo once, store rich AST‑derived metadata, and expose it via a lightweight MCP server.
Result: 2‑4 K tokens per query, offline operation, fast incremental updates, and a clear path to organization‑wide code intelligence.

MCP Repo Context – Recent Enhancements

Overview

The MCP server is evolving to give developers richer, cross‑repo insights without needing to run the code. Below is a concise, markdown‑formatted summary of the new capabilities, tools, and usage instructions.

1. End‑to‑End Request Tracing

Static analysis now parses destination URLs, route registrations (e.g., gorilla/mux), and async message producers/consumers (Kafka, RabbitMQ, NATS).
The data is matched across repositories to build a service‑to‑service flow graph.
New tools:
- trace_api_flow – traces a request from entry point to all downstream services.
- get_service_map – visualizes the entire service connectivity map.

Static analysis for distributed tracing works on code that isn’t deployed yet, so no OpenTelemetry instrumentation is required.

2. Full Module Dependency Parsing

The server now reads go.mod files, handling:
- Direct & indirect dependencies.
- replace directives.
- Import classification (stdlib vs. internal vs. external).
Tool: get_dependency_graph – outputs a Mermaid diagram showing how repositories depend on each other.

This forms the foundation for cross‑repo features.

3. Organization‑Wide Semantic Indexing

Repositories can be grouped under an organization model.
Added search_org tool that combines:
- Keyword search.
- Vector search.
- Hybrid ranking via reciprocal rank fusion.

Example query: “find authentication code” → returns results from all 50+ repos, ranked by relevance.

4. One‑Call PR Impact Analysis

Instead of multiple tool calls, agents can now invoke a single function:

analyze_pr_impact – returns:
- Changed function behaviours.
- Callers affected.
- Cross‑service impact.
- Dependency‑level impact.
- Risk assessment.

Pre‑built recipes for the three most common workflows (PR impact analysis, API flow explanation, architecture review) stay within an 8 K token budget.

5. Extensible Analyzer & Embedder

Plugin interfaces introduced:
- AnalyzerPlugin – add language support (TypeScript, Python, etc.).
- EmbedderPlugin – swap embedding models.

Current experiment: Voyage Code‑3 – 16 % better than OpenAI on code retrieval.

6. REST API & Multi‑Tenant Deployment

Core tools are now wrapped as a REST API with:
- GitHub/GitLab webhook integration for auto‑analysis on push events.
- Multi‑tenant storage for organization isolation.
- Async analysis queuing.

Goal: Deploy once for an entire team instead of per‑developer.

7. Open‑Source Repository

https://github.com/yashpalsinhc/mcp-repo-context

8. Configuration for Claude Code

Add the following entry to your MCP config (JSON‑RPC over stdio):

{
  "mcpServers": {
    "repo-context": {
      "command": "path/to/mcp-repo-context",
      "args": ["--data-dir", "~/.mcp-data"]
    }
  }
}

9. Typical Claude Code Interaction

> Analyze my local project at /path/to/repo
> What does the CreateUser function do?
> Find all database operations
> Show me the call graph for HandleLogin

10. Takeaway

If you’re building AI‑powered developer tools, the MCP ecosystem is worth exploring:

Simple protocol (JSON‑RPC over stdio).
Go‑based server side is performant and easy to extend.
Turns expensive, slow AI exploration into fast, precise queries.

I use this server every day – it’s changed how I work with AI on code.