I Built a 35-Tool MCP Server That Cut My AI Token Usage by 95%

Published: (February 26, 2026 at 02:36 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

Context‑Heavy AI Code Exploration – The Problem

Every time I asked Claude to help me with a codebase, the same thing happened:

  • The model read file after file, burning through 50 K+ tokens just to understand the project structure.
  • I hit the context limit before getting any real work done.

My Solution – MCP Repo Context Server

I built an MCP (Model Context Protocol) server that analyzes a codebase once, extracts everything an AI agent needs (function behaviours, call graphs, DB queries, HTTP calls, …), and serves precise answers in 2‑4 K tokens instead of 50 K+.

How It Works

When you point an AI agent at a codebase, it has no memory. Every session starts from scratch, runs grep, reads files one by one, and builds a mental model—slowly, expensively, and incompletely.

For a medium‑sized Go project (~100 files) a typical exploration burns:

IssueTokens
Understanding what functions exist≈ 50 K
Multiple rounds of grep → read
Missed cross‑file relationships (call graphs)

This isn’t an AI problem. It’s a context‑delivery problem.

Architecture

┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  AI Agent    │────▶│  MCP Server      │────▶│  Storage Layer  │
│  (Claude)    │◀────│  (JSON‑RPC/stdio)│◀────│  JSON + SQLite │
└──────────────┘     └──────────────────┘     └─────────────────┘

                    ┌───────┼───────┐
                    ▼       ▼       ▼
              AST Parser  Vector  Call Graph
              (Go)        Search  Builder

Three Layers that Make It Work

  1. AST Parsing – Using Go’s go/ast package to extract:

    • Every function signature
    • Step‑by‑step behaviour
    • DB queries, HTTP calls, error‑handling patterns, side effects
    • Wrapped errors, deferred calls, goroutine launches (real syntax‑tree traversal, not regex)
  2. Semantic Vector Search – Each function/type gets a 384‑dimensional TF‑IDF embedding stored in SQLite.

    • Queries like “find authentication code” match semantically similar functions.
    • No external API calls; embeddings are computed locally.
  3. Call‑Graph Extraction – Builds a complete call graph (direct, goroutine, deferred).

    • Powers tools such as get_callers and visualize_call_graph (Mermaid diagrams).

Sample Tools (35 total)

CategoryToolDescription
Function Insightget_function_contextReturns behaviour summary, execution steps, DB queries (actual SQL), HTTP calls (endpoints), error‑handling patterns, callers & callees.
Side‑Effect Searchsearch_by_side_effecteffect: "db_query" → every function touching the DB (with queries). Works for http_call, file_io, logging.
Concept Searchsearch_by_conceptconcept: "authentication" → all auth‑related functions, powered by the semantic index.
Incremental Updaterefresh_fileRe‑analyzes a single changed file in ~10 ms, updating stored context.
Visualizationvisualize_call_graphGenerates a Mermaid flowchart of callers/callees at configurable depth.

Token‑Usage Comparison

TaskAgent‑Only (Claude)MCP Server
Understand a function~50 K tokens~4 K tokens
Find related code~30 K tokens~2‑3 K tokens
After editing a file (full re‑explore)~1‑2 K tokens
Natural‑language Q&ANot possible~8 K tokens

Result: 10‑25× reduction in token usage per query → no more context‑limit hits, fast, responsive AI assistance.

Implementation Details

  • Dependencies: only go-git (Git ops) and go-sqlite3 (vector storage).
    All other features (AST parsing, HTTP handling, JSON) use the Go standard library → tiny binary, minimal supply‑chain, trivial deployment.

  • Embeddings: Chose local TF‑IDF over OpenAI’s embedding API.

    • Sufficient quality for code search (function names & patterns are distinctive).
    • Works offline, zero latency, no API keys, no rate limits, no cost.
  • Result Pagination: Search returns compact references; each includes a detail_ref that the AI can call to expand.

    • AI gets a list of ~20 matches in ~2 K tokens, then fetches full details only for the 2‑3 it actually needs.
  • Concurrency: Analyses of different repos run concurrently; operations on the same repo serialize.

    • Avoids a global mutex, allowing parallel work across services.
  • Language Support: Deep AST analysis currently Go‑only. Other languages get a generic extractor (basic structure, no behaviour details).

    • Next step: add Tree‑Sitter parsers for Python & TypeScript.
  • Transport: STDIO only now. MCP spec also supports HTTP/SSE, which would let the server run as a long‑lived daemon shared across multiple AI sessions.

    • Presently each Claude Code session spawns its own server process.

Roadmap – Scaling to Organization‑Level Intelligence

Killer Feature: “What happens when someone hits /login?”
The server should trace the entire flow:

  • Request → Service A LoginHandler → Service B /auth/validate → Kafka user.verified → Service C VerificationHandler.

To achieve this we need:

  1. HTTP Client Call Detection – Extract outbound HTTP calls, infer target services/endpoints.
  2. Message‑Bus (Kafka) Flow Tracking – Identify producers/consumers, map topics to handlers.
  3. Cross‑Service Call Graph – Merge per‑repo graphs into a global view.
  4. Transport Upgrade – Implement HTTP/SSE for a persistent daemon.
  5. Multi‑Language AST – Add Tree‑Sitter parsers for Python & TypeScript to broaden coverage.

TL;DR

  • Problem: AI agents waste tens of thousands of tokens just to understand a codebase.
  • Solution: Pre‑process the repo once, store rich AST‑derived metadata, and expose it via a lightweight MCP server.
  • Result: 2‑4 K tokens per query, offline operation, fast incremental updates, and a clear path to organization‑wide code intelligence.

MCP Repo Context – Recent Enhancements

Overview

The MCP server is evolving to give developers richer, cross‑repo insights without needing to run the code. Below is a concise, markdown‑formatted summary of the new capabilities, tools, and usage instructions.

1. End‑to‑End Request Tracing

  • Static analysis now parses destination URLs, route registrations (e.g., gorilla/mux), and async message producers/consumers (Kafka, RabbitMQ, NATS).
  • The data is matched across repositories to build a service‑to‑service flow graph.
  • New tools:
    • trace_api_flow – traces a request from entry point to all downstream services.
    • get_service_map – visualizes the entire service connectivity map.

Static analysis for distributed tracing works on code that isn’t deployed yet, so no OpenTelemetry instrumentation is required.

2. Full Module Dependency Parsing

  • The server now reads go.mod files, handling:
    • Direct & indirect dependencies.
    • replace directives.
    • Import classification (stdlib vs. internal vs. external).
  • Tool: get_dependency_graph – outputs a Mermaid diagram showing how repositories depend on each other.

This forms the foundation for cross‑repo features.

3. Organization‑Wide Semantic Indexing

  • Repositories can be grouped under an organization model.
  • Added search_org tool that combines:
    • Keyword search.
    • Vector search.
    • Hybrid ranking via reciprocal rank fusion.

Example query: “find authentication code” → returns results from all 50+ repos, ranked by relevance.

4. One‑Call PR Impact Analysis

Instead of multiple tool calls, agents can now invoke a single function:

  • analyze_pr_impact – returns:
    • Changed function behaviours.
    • Callers affected.
    • Cross‑service impact.
    • Dependency‑level impact.
    • Risk assessment.

Pre‑built recipes for the three most common workflows (PR impact analysis, API flow explanation, architecture review) stay within an 8 K token budget.

5. Extensible Analyzer & Embedder

  • Plugin interfaces introduced:
    • AnalyzerPlugin – add language support (TypeScript, Python, etc.).
    • EmbedderPlugin – swap embedding models.

Current experiment: Voyage Code‑3 – 16 % better than OpenAI on code retrieval.

6. REST API & Multi‑Tenant Deployment

  • Core tools are now wrapped as a REST API with:
    • GitHub/GitLab webhook integration for auto‑analysis on push events.
    • Multi‑tenant storage for organization isolation.
    • Async analysis queuing.

Goal: Deploy once for an entire team instead of per‑developer.

7. Open‑Source Repository

https://github.com/yashpalsinhc/mcp-repo-context

8. Configuration for Claude Code

Add the following entry to your MCP config (JSON‑RPC over stdio):

{
  "mcpServers": {
    "repo-context": {
      "command": "path/to/mcp-repo-context",
      "args": ["--data-dir", "~/.mcp-data"]
    }
  }
}

9. Typical Claude Code Interaction

> Analyze my local project at /path/to/repo
> What does the CreateUser function do?
> Find all database operations
> Show me the call graph for HandleLogin

10. Takeaway

If you’re building AI‑powered developer tools, the MCP ecosystem is worth exploring:

  • Simple protocol (JSON‑RPC over stdio).
  • Go‑based server side is performant and easy to extend.
  • Turns expensive, slow AI exploration into fast, precise queries.

I use this server every day – it’s changed how I work with AI on code.

0 views
Back to Blog

Related posts

Read more »