Agentgateway Review: A Feature-Rich New AI Gateway
Source: Dev.to
Introduction
agentgateway is a data plane developed by Solo specifically for AI scenarios. Written in Rust, it can be configured via xDS (a gRPC‑based protocol) and YAML. Recently it replaced kgateway’s AI data plane (previously based on Envoy) with the open‑source version of agentgateway; the enterprise version of Gloo is expected to follow.
The gateway supports four AI scenarios:
- MCP
- A2A
- Proxying inference requests to LLM providers
- Load balancing for inference services
Below is an overview of each scenario. Note that the discussion focuses on the open‑source agentgateway; some features may be exclusive to the enterprise edition.
MCP
Agentgateway was originally created to handle stateful Model Context Protocol (MCP) requests, which were difficult to manage with existing Envoy data planes. Consequently, its MCP support is the most complete.
Session handling
- By default, MCP is treated as a stateful protocol.
- A
SessionManagerstruct manages session creation and maintenance (code link). - The
SessionManageris an in‑process store, so multiple agentgateway instances do not share session state. - For sticky sessions toward upstreams, it is simpler to consistent‑hash on the
MCP-Session-IDheader, ensuring the same session ID routes to the same backend even across different instances. - Extending
SessionManagerto use a remote store is possible but adds overhead.
The default stateful handling is considered a mistake by some; there are plans to make MCP stateless by default (discussion).
Multiplexing
When multiple backends are configured, agentgateway enables MCP multiplexing:
- For tools listing, it sends
tools/listto every backend, then rewrites tool names to${backend_name}_${tool_name}. - Subsequent tool calls are routed to the appropriate backend.
- Methods that cannot be multiplexed return an “invalid method” error.
REST‑to‑MCP conversion
Agentgateway can convert RESTful APIs to MCP tools using an OpenAPI specification:
- It can treat an entire OpenAPI spec as a backend.
- The gateway forwards requests; it does not manage the underlying RESTful APIs.
- Current limitations:
- Only
application/jsonbodies are supported. - HTTPS upstreams are not yet supported.
- Structured output is not supported.
- Certain schema nuances (e.g.,
additionalProperties) require further handling.
- Only
Authentication & Authorization
- OAuth‑based MCP authentication: protected resource metadata is exposed at paths like
/.well-known/oauth-protected-resource/${resource}.- CORS headers are automatically added to metadata responses, simplifying browser‑based MCP clients.
- JWKS handling:
- Public keys are fetched from a JWKS URL or file path.
- The JWKS URL can be derived from the issuer URL and type.
- Keys are loaded only during configuration parsing and are not periodically refreshed (code reference).
- Authorization uses a list of CEL expressions that filter based on JWT fields and MCP attributes. Example:
mcpAuthorization:
rules:
# Allow anyone to call 'echo'
- 'mcp.tool.name == "echo"'
# Only the test-user can call 'add'
- 'jwt.sub == "test-user" && mcp.tool.name == "add"'
# Authenticated users with claim nested.key == "value" can access 'printEnv'
- 'mcp.tool.name == "printEnv" && jwt.nested.key == "value"'
In multiplexing scenarios,
mcpAuthorizationruns before tool names are merged, so the rules see the original tool names (without backend prefixes).
Metrics
Agentgateway currently provides only a basic mcp_requests counter, lacking detailed per‑tool or latency metrics.
A2A
For Agent‑to‑Agent (A2A) protocol scenarios, agentgateway implements two primary features:
- URL rewriting – Agent card URLs are rewritten to point to the gateway instead of the proxied backend.
- Request parsing – A2A JSON requests are parsed and the request method is recorded for observability.
Proxying Inference Requests to LLM Providers
Agentgateway can proxy inference requests to large language model (LLM) providers, adding value beyond raw forwarding:
- Observability – Token usage and time‑to‑first‑token (TTFT) metrics are collected for Server‑Sent Events (SSE) streams.
- Streaming support – Dedicated parsers handle non‑SSE streaming formats such as AWS Bedrock’s event stream.
- Rate limiting & prompt protection – (Details to be covered in a follow‑up article.)
Provider‑agnostic API surface
Agentgateway lifts some LLM client features into the gateway to reduce integration effort, offering an OpenAI‑compatible external API. It currently supports two route types:
| Provider | Route |
|---|---|
| OpenAI | /v1/chat/completions |
| Anthropic | /v1/messages |
Both routes are chat‑style endpoints; OpenAI’s /v1/chat/completions is functionally equivalent to Anthropic’s /v1/messages. Implementing both separately simplifies onboarding for agents that target only one provider.
Limitations:
- Structured output (e.g., OpenAI’s structured outputs) is not yet supported.
- Embeddings, batching, and other advanced features are still missing.
Inference Extension Support
The Gateway API Inference Extension (https://gateway-api-inference-extension.sigs.k8s.io/) enables distributed inference via a scheduler (EPP) that communicates with the gateway using Envoy’s gRPC ext_proc protocol.
- The scheduler returns an
x-gateway-destination-endpointheader indicating the target upstream address. - The gateway forwards the inference request to that endpoint, effectively acting as a thin proxy.
Red Hat’s involvement (through the LLMD project) and its investment in AI tooling (e.g., vLLM) suggest the inference extension could gain traction as a standard component for AI workloads.