How I Built an MCP Server That Lets Claude Code Talk to Every LLM I Pay For

Published: (February 5, 2026 at 10:53 PM EST)
6 min read
Source: Dev.to

Source: Dev.to

The Problem

You’re juggling four different AI ecosystems:

  • ChatGPT Plus
  • Claude MAX
  • Gemini
  • Local models via Ollama

Each requires its own browser tab, and you spend valuable time copy‑pasting prompts, re‑formatting, and losing context just to compare answers. A simple “what does each model think?” ends up taking minutes instead of seconds.

The Solution: HydraMCP

HydraMCP is an MCP (Model Control Protocol) server that routes a single prompt to any cloud or local model you have access to, executing them in parallel and returning the results side‑by‑side.

Key Benefits

  • One interface for all models
  • Parallel execution (2‑5 models at once)
  • No more context loss or manual re‑formatting
  • Works from the terminal, IDE, or any Claude Code environment

Exposed Tools (Claude Code)

ToolDescription
list_modelsShows every model available across all providers in one command.
ask_modelQueries a single model (e.g., “Give me GPT‑5’s take on this”) without leaving your terminal.
compare_modelsSends the same prompt to 2‑5 models in parallel and returns the results side‑by‑side.

Example Usage

# List everything you can query
list_models
# Ask a single model
ask_model gpt-5-codex "Explain the difference between supervised and unsupervised learning."
# Compare several models at once
compare_models gpt-5-codex, gemini-3, claude-sonnet, local-qwen \
    "Review this function and suggest improvements:

def foo(bar):
    return bar * 2"

Result (simplified view):

ModelResponse
GPT‑5 Codex
Gemini‑3
Claude Sonnet
Local Qwen

Each answer appears next to the others, making it trivial to spot differences, strengths, or errors.

Getting Started

  1. Install HydraMCP (via pip, Docker, or binary).

  2. Configure your API keys for OpenAI, Anthropic, Google, and your Ollama endpoint.

  3. Run the server:

    hydramcp serve
  4. Use any of the three tools from Claude Code, your terminal, or an IDE plugin.

TL;DR

HydraMCP lets you query all your AI models with a single prompt, get parallel, side‑by‑side results, and finally stop the endless copy‑paste dance. 🚀

Model Comparison (4 models, 11 637 ms total)

ModelLatencyTokens
gpt-5-codex1 630 ms (fastest)194
gemini-3-pro-preview11 636 ms1 235
claude‑sonnet‑4‑5‑202509293 010 ms202
ollama/qwen2.5-coder:14b8 407 ms187

All four independently found the same async bug, then each caught something the others missed:

  • GPT‑5 – fastest.
  • Gemini – most thorough.
  • Claude – clearest fix.
  • Qwen – explained the root cause.

Different training data, different strengths.

Consensus Polling

“Consensus polls 3‑7 models on a question and has a separate judge model evaluate whether they actually agree. It returns a confidence score and groups responses by agreement.”

The tool synthesises the fan‑out to multiple models, collects their responses, and then a synthesiser model combines the best insights into one answer. The result is usually better than any individual response.

Architecture Overview

Claude Code
    |
    HydraMCP (MCP Server)
    |
    Provider Interface
    |-- CLIProxyAPI  → cloud models (GPT, Gemini, Claude, …)
    |-- Ollama       → local models (your hardware)

HydraMCP sits between Claude Code and your model providers. It communicates over stdio using JSON‑RPC (the MCP protocol), routes requests to the appropriate backend, and formats everything to keep your context window manageable.

Provider Interface

interface Provider {
  name: string;
  healthCheck(): Promise;
  listModels(): Promise;
  query(
    model: string,
    prompt: string,
    options?: QueryOptions
  ): Promise;
}

Every backend implements only three methods: healthCheck(), listModels(), and query(). Adding a new provider means implementing those three functions and registering it.

  • Cloud modelsCLIProxyAPI turns existing subscriptions into a local OpenAI‑compatible API. You authenticate once per provider through a browser login; no per‑token billing—you use the subscriptions you already pay for.
  • Local modelsOllama runs on localhost and provides models like Qwen, Llama, and Mistral. Zero API keys, zero cost beyond electricity.

When you compare four models, all queries fire simultaneously using Promise.allSettled().
Total time = latency of the slowest model, not the sum of all latencies.
(That five‑model comparison above took 11.6 s total, not > 25 s.)

If one model fails, you still get results from the others – graceful degradation instead of all‑or‑nothing.

Agreement Detection

Naïve keyword matching fails to determine if models truly agree.

Model A: “Start with a monolith.”
Model B: “Monolith because it’s simpler.”

They agree semantically, but keyword overlap is low.

Solution: The consensus tool picks a judge model (not in the poll) and asks it to evaluate agreement. The judge reads all responses and groups them semantically.

Three cloud models polled, local Qwen judging.
Strategy: majority (needed 2/3)
Agreement: 3/3 models (100%)
Judge latency: 686 ms

Using a local model as judge means zero cloud quota for the evaluation step.

The keyword‑based fallback (when no judge is available) works for factual questions but breaks on subjective ones. The LLM‑judge approach is significantly better, though still an area for improvement.

Cold‑Start Penalty for Local Models

  • First request to Qwen 32B: ~24 s (model loading).
  • By the fourth request: ~3 s – an improvement once the model is warm.

If you use HydraMCP regularly, your local models stay warm and the experience is seamless. The first query of the day may be slow; everything after that is fast.

Synthesise Tool

The most ambitious feature: it collects responses from multiple models, then feeds them to a synthesiser model with instructions to combine the best insights and drop filler.

  • The synthesiser is deliberately chosen from a model not in the source list when possible.
  • Prompt (simplified):
Here are responses from four models. Write one definitive answer.
Take the best from each.

In practice, the synthesized result usually has better structure than any individual response and catches details that at least one model missed.

Implementation Details

  • Language: TypeScript (~1 500 lines)
  • Dependencies:
    • @modelcontextprotocol/sdk – MCP protocol
    • zod – input validation
    • Node 18+ (no Express, no database, no extra build framework)

All tool inputs are validated with Zod schemas, and all logging goes to stderr (stdout is reserved for the JSON‑RPC protocol – sending anything else there breaks MCP).

Quick‑Start Guide (≈ 5 minutes)

  1. Set up backends – CLIProxyAPI and/or Ollama.

  2. Clone, install, build HydraMCP.

  3. Add backend URLs to .env.

  4. Register with Claude Code

    claude mcp add hydramcp -s user -- node /path/to/dist/index.js
  5. Restart Claude Code, then run list models.

  6. From there you just talk naturally:

    • “Ask GPT‑5 to review this.”
    • “Compare three models on this approach.”
    • “Get consensus on whether this is thread‑safe.”

Claude Code routes everything through HydraMCP automatically.

Extending the Provider Interface

The design is intentionally extensible. Future backends I’d like to see:

  • LM Studio – another local model option.
  • OpenRouter – pay‑per‑token access to models you don’t subscribe to.
  • Direct API keys – OpenAI, Anthropic, Google (without CLIProxyAPI).

Each new provider is roughly 100 lines of TypeScript: implement the three interface methods, register it, and you’re done.

Bottom Line

The real value isn’t any single feature. It’s the workflow change: instead of trusting one model’s opinion, you can cheaply verify it against several others, get a consensus, and synthesize a higher‑quality answer—all without leaving your editor or terminal.

Multi‑Model AI Comparison

Instead of wondering whether GPT or Claude is better for a specific task, you can simply compare them side‑by‑side and see the results.

Why Use Multiple Models?

  • Different strengths:

    • GPT‑5 often catches performance issues that Claude misses.
    • Claude can suggest architectural patterns that GPT doesn’t consider.
    • Gemini sometimes provides the most thorough analysis.
    • Local Qwen is surprisingly good at explaining why something is wrong, not just what is wrong.
  • Unified terminal access:
    Having all of them available from a single terminal, with parallel execution and structured comparison, changes how you think about using AI for code. It shifts the workflow from “ask my preferred model” to “ask the right model for this task”—or simply “ask all of them and see what shakes out.”

HydraMCP

If you have subscriptions collecting dust or local models sitting idle, HydraMCP puts them to work. And if you want to add a new provider, the interface is documented and the examples are provided.

Back to Blog

Related posts

Read more »

API Gateway vs Gateway API

API Gateway An API Gateway is a central entry point for all client requests, acting as a reverse proxy that routes them to the appropriate backend microservice...