Agentgateway Review: A Feature-Rich New AI Gateway

Published: 2 months ago (December 2, 2025 at 06:23 AM EST)

4 min read

Source: Dev.to

Introduction

agentgateway is a data plane developed by Solo specifically for AI scenarios. Written in Rust, it can be configured via xDS (a gRPC‑based protocol) and YAML. Recently it replaced kgateway’s AI data plane (previously based on Envoy) with the open‑source version of agentgateway; the enterprise version of Gloo is expected to follow.

The gateway supports four AI scenarios:

MCP
A2A
Proxying inference requests to LLM providers
Load balancing for inference services

Below is an overview of each scenario. Note that the discussion focuses on the open‑source agentgateway; some features may be exclusive to the enterprise edition.

MCP

Agentgateway was originally created to handle stateful Model Context Protocol (MCP) requests, which were difficult to manage with existing Envoy data planes. Consequently, its MCP support is the most complete.

Session handling

By default, MCP is treated as a stateful protocol.
A SessionManager struct manages session creation and maintenance (code link).
The SessionManager is an in‑process store, so multiple agentgateway instances do not share session state.
For sticky sessions toward upstreams, it is simpler to consistent‑hash on the MCP-Session-ID header, ensuring the same session ID routes to the same backend even across different instances.
Extending SessionManager to use a remote store is possible but adds overhead.

The default stateful handling is considered a mistake by some; there are plans to make MCP stateless by default (discussion).

Multiplexing

When multiple backends are configured, agentgateway enables MCP multiplexing:

For tools listing, it sends tools/list to every backend, then rewrites tool names to ${backend_name}_${tool_name}.
Subsequent tool calls are routed to the appropriate backend.
Methods that cannot be multiplexed return an “invalid method” error.

REST‑to‑MCP conversion

Agentgateway can convert RESTful APIs to MCP tools using an OpenAPI specification:

It can treat an entire OpenAPI spec as a backend.
The gateway forwards requests; it does not manage the underlying RESTful APIs.
Current limitations:
- Only application/json bodies are supported.
- HTTPS upstreams are not yet supported.
- Structured output is not supported.
- Certain schema nuances (e.g., additionalProperties) require further handling.

Authentication & Authorization

OAuth‑based MCP authentication: protected resource metadata is exposed at paths like /.well-known/oauth-protected-resource/${resource}.
- CORS headers are automatically added to metadata responses, simplifying browser‑based MCP clients.
JWKS handling:
- Public keys are fetched from a JWKS URL or file path.
- The JWKS URL can be derived from the issuer URL and type.
- Keys are loaded only during configuration parsing and are not periodically refreshed (code reference).
Authorization uses a list of CEL expressions that filter based on JWT fields and MCP attributes. Example:

mcpAuthorization:
  rules:
    # Allow anyone to call 'echo'
    - 'mcp.tool.name == "echo"'
    # Only the test-user can call 'add'
    - 'jwt.sub == "test-user" && mcp.tool.name == "add"'
    # Authenticated users with claim nested.key == "value" can access 'printEnv'
    - 'mcp.tool.name == "printEnv" && jwt.nested.key == "value"'

In multiplexing scenarios, mcpAuthorization runs before tool names are merged, so the rules see the original tool names (without backend prefixes).

Metrics

Agentgateway currently provides only a basic mcp_requests counter, lacking detailed per‑tool or latency metrics.

A2A

For Agent‑to‑Agent (A2A) protocol scenarios, agentgateway implements two primary features:

URL rewriting – Agent card URLs are rewritten to point to the gateway instead of the proxied backend.
Request parsing – A2A JSON requests are parsed and the request method is recorded for observability.

Proxying Inference Requests to LLM Providers

Agentgateway can proxy inference requests to large language model (LLM) providers, adding value beyond raw forwarding:

Observability – Token usage and time‑to‑first‑token (TTFT) metrics are collected for Server‑Sent Events (SSE) streams.
Streaming support – Dedicated parsers handle non‑SSE streaming formats such as AWS Bedrock’s event stream.
Rate limiting & prompt protection – (Details to be covered in a follow‑up article.)

Provider‑agnostic API surface

Agentgateway lifts some LLM client features into the gateway to reduce integration effort, offering an OpenAI‑compatible external API. It currently supports two route types:

Provider	Route
OpenAI	`/v1/chat/completions`
Anthropic	`/v1/messages`

Both routes are chat‑style endpoints; OpenAI’s /v1/chat/completions is functionally equivalent to Anthropic’s /v1/messages. Implementing both separately simplifies onboarding for agents that target only one provider.

Limitations:

Structured output (e.g., OpenAI’s structured outputs) is not yet supported.
Embeddings, batching, and other advanced features are still missing.

Inference Extension Support

The Gateway API Inference Extension (https://gateway-api-inference-extension.sigs.k8s.io/) enables distributed inference via a scheduler (EPP) that communicates with the gateway using Envoy’s gRPC ext_proc protocol.

The scheduler returns an x-gateway-destination-endpoint header indicating the target upstream address.
The gateway forwards the inference request to that endpoint, effectively acting as a thin proxy.

Red Hat’s involvement (through the LLMD project) and its investment in AI tooling (e.g., vLLM) suggest the inference extension could gain traction as a standard component for AI workloads.

Agentgateway Review: A Feature-Rich New AI Gateway

Introduction

MCP

Session handling

Multiplexing

REST‑to‑MCP conversion

Authentication & Authorization

Metrics

A2A

Proxying Inference Requests to LLM Providers

Provider‑agnostic API surface

Inference Extension Support

Related posts

Can CVE-2025-55182 (React Server Components Vulnerability) Create Files Like .sh, .gz, or XMRig Miners in Server Root?

How to Use Google Shopping Ads to Maximize Revenue

Understanding the S&P/ASX 200 Financials (XFJ): A Deep Dive into Australia’s Financial Powerhouse Sector

Creating a Universal Hybrid Resource (Clearnet + Darknet). ||V2.0||