Introducing Virtual MCP Server: Unified Gateway for Multi-MCP Workflows

Published: (December 11, 2025 at 10:59 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

The problem: connection overload

Picture this: you’re an engineer on a platform team. Your AI assistant needs access to GitHub for code, Jira for tickets, Slack for notifications, PagerDuty for incidents, Datadog for metrics, AWS for infrastructure, Confluence for docs, and your internal knowledge base. That’s 8 separate MCP server connections, each exposing 10‑20+ tools. Now your AI’s context window is filling up with 80+ tool descriptions, burning tokens and degrading performance as the LLM struggles to select the right tools from an overwhelming list.

Each MCP server connection requires:

  • Individual configuration in your AI client
  • Separate authentication credentials
  • Manual coordination when tasks span multiple systems
  • Repeated parameter entry (same repo, same channel, same database)
  • Tool filtering to avoid context bloat and wasted tokens

Want to investigate a production incident? You’re manually running commands across 4 different systems and piecing together the results yourself. Deploying an app? You’re orchestrating a sequence of operations: merge PR, wait for CI, get approval, deploy, notify team. It’s tedious, error‑prone, and not reusable.

The solution: aggregate everything

vMCP transforms those 8 connections into one. You configure a single MCP endpoint that aggregates all your backend servers.

Before vMCP:

{
  "servers": {
    "github": { "url": "..." },
    "jira": { "url": "..." },
    "slack": { "url": "..." },
    "pagerduty": { "url": "..." },
    "datadog": { "url": "..." },
    "aws": { "url": "..." },
    "confluence": { "url": "..." },
    "docs": { "url": "..." }
  }
}

With vMCP:

{
  "servers": {
    "company-tools": {
      "url": "http://vmcp.company.com/mcp"
    }
  }
}

One connection. One authentication flow. All your tools available.

You can run as many vMCP instances as you need. Your frontend team connects to one vMCP with their specific tools; your platform team connects to another with infrastructure access. Each vMCP aggregates exactly the backends that each team needs, with appropriate security policies and permissions. This improves security (no more giving everyone access to everything) and efficiency (fewer tools means smaller context windows, lower token costs, and better AI performance).

What vMCP does

vMCP is part of the ToolHive Kubernetes Operator. It acts as an intelligent aggregation layer that sits between your AI client and your backend MCP servers.

Diagram of the basic vMCP architecture

1. Multi‑server aggregation with tool filtering

All MCP tools appear through a single endpoint, but you cherry‑pick exactly which tools to expose.

  • Example: an engineer on the ToolHive team gets a single vMCP connection with:
    • GitHub’s search_code tool (scoped to the stacklok/toolhive repo only)
    • The ToolHive docs MCP server
    • An internal docs server hooked up to Google Drive and filtered to ToolHive design docs
    • Slack (only the #toolhive-team channel)

No irrelevant tools clutter the LLM’s context, and no tokens are wasted on unused tool descriptions.

When multiple MCP servers have tools with the same name (e.g., both GitHub and Jira have create_issue), vMCP automatically prefixes them (github_create_issue, jira_create_issue). You can customize these names as needed.

2. Declarative multi‑system workflows

Real tasks often require coordinating across multiple systems. vMCP lets you define deterministic workflows that execute in parallel with conditionals, error handling, and approval gates.

Incident investigation workflow

→ Query logs from logging system
→ Fetch metrics from monitoring platform
→ Pull traces from tracing service
→ Check infrastructure status from cloud provider
→ Manually combine everything into a report
→ Create Jira ticket with findings

vMCP runs the queries in parallel, aggregates the data, and creates the ticket. Define the workflow once and reuse it for every incident.

App deployment workflow

→ Merge pull request in GitHub
→ Wait for CI tests to pass
→ Request human approval (using MCP elicitation)
→ Deploy (only if approved)
→ Notify team in Slack

3. Pre‑configured defaults and guardrails

Stop typing the same parameters repeatedly. Configure defaults once in vMCP.

Before: Every GitHub query requires specifying repo: stacklok/toolhive
After: The repo is pre‑configured, preventing accidental queries to the wrong repository.

Pre‑configuring parameters ensures deterministic behavior, security, and consistency across all users.

4. Tool customization and security policies

Third‑party MCP servers often expose generic, unrestricted tools. vMCP lets you wrap and restrict them without modifying upstream servers.

  • Security policy enforcement – Restrict a website‑fetch tool to internal domains only (*.company.com), validate URLs before calling the backend, and provide clear error messages for violations.
  • Simplified interfaces – Wrap a complex AWS EC2 tool that has 20+ parameters, exposing only the three parameters your frontend team needs, with safe defaults for everything else.

5. Centralized authentication

vMCP implements a two‑boundary authentication model with a complete audit trail. Your AI client authenticates once to vMCP using the OAuth 2.1 methods defined in the official MCP spec. vMCP then handles authorization to each backend independently based on its requirements.

When you need to revoke access, disable the user in your identity provider and all backend access is revoked instantly.

Real‑world benefits

Without vMCP

  • 4 sequential manual commands
  • 2‑3 minutes per command
  • 5‑10 minutes aggregating and formatting
  • 15‑20 minutes total per incident
  • Results vary by engineer; process isn’t documented or reusable

With vMCP

  • One command triggers the workflow
  • Parallel execution: ~30 seconds
  • Automatic aggregation and formatting
  • Consistent results every time
  • Workflow is documented as code; any team member can use it

For a team handling 20 incidents per week, the time savings translate to roughly 5 hours saved weekly, along with reduced token costs and more reliable incident response.

Back to Blog

Related posts

Read more »