We Evaluated 13 LLM Gateways for Production. Here's What We Found

Published: (December 14, 2025 at 01:35 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Why We Needed This

Our team builds AI evaluation and observability tools at Maxim.
We work with companies running production AI systems, and the same question kept coming up:

“Which LLM gateway should we use?”

So we decided to actually test them—​not just read docs or check GitHub stars.
We ran real production workloads through 13 different LLM gateways and measured what actually happens.

Research

What We Tested

We evaluated gateways across five categories:

  • Performance – latency, throughput, memory usage
  • Features – routing, caching, observability, failover
  • Integration – how easy it is to drop into existing code
  • Cost – pricing model and hidden costs
  • Production‑readiness – stability, monitoring, enterprise features

Test workload

  • 500 RPS sustained traffic
  • Mix of GPT‑4 and Claude requests
  • Real customer support queries

The Results (Honest Take)

Tier 1: Production‑Ready at Scale

1. Bifrost (Ours — but hear us out)

We built Bifrost because nothing else met our scale requirements.

Pros

  • Fastest in our tests (~11 µs overhead at 5K RPS)
  • Rock‑solid memory usage (~1.4 GB stable under load)
  • Semantic caching actually works
  • Adaptive load balancing automatically downweights degraded keys
  • Open source (MIT)

Cons

  • Smaller community than LiteLLM
  • Go‑based (great for performance, harder for Python‑only teams)
  • Fewer provider integrations than older tools

Best for: High‑throughput production (500+ RPS), teams prioritizing performance and cost efficiency

2. Portkey

Strong commercial offering with solid enterprise features.

Pros

  • Excellent observability UI
  • Good multi‑provider support
  • Reliability features (fallbacks, retries)
  • Enterprise support

Cons

  • Pricing scales up quickly at volume
  • Platform lock‑in
  • Some latency overhead vs. open‑source tools

Best for: Enterprises that want a fully managed solution

3. Kong

API‑gateway giant with an LLM plugin.

Pros

  • Battle‑tested infrastructure
  • Massive plugin ecosystem
  • Enterprise features (auth, rate limiting)
  • Multi‑cloud support

Cons

  • Complex setup for LLM‑specific workflows
  • Overkill if you just need LLM routing
  • Steep learning curve

Best for: Teams already using Kong that want LLM support

Tier 2: Good for Most Use Cases

4. LiteLLM

The most popular open‑source option. We used this before Bifrost.

Pros

  • Huge community
  • Supports almost every provider
  • Python‑friendly
  • Easy to get started

Cons

  • Performance issues above ~300 RPS (we hit this)
  • Memory usage grows over time
  • P99 latency spikes under load

Best for: Prototyping, low‑traffic apps (P50)

Evaluation Criteria

  • Total cost (not list pricing) – Infra + LLM usage + engineering time + lock‑in.
  • Observability – Can you debug failures, latency, and cost?
  • Reliability – Failover, rate limits, auto‑recovery.
  • Migration path – Can you leave later? Can you self‑host?

Our Recommendations

  • Most teams starting out: LiteLLM → migrate later
  • High‑growth startups: Bifrost or Portkey from day one
  • Enterprises: Portkey or Kong
  • Cost‑sensitive teams: Bifrost (open‑source) or Helicone for observability‑focused setups
Back to Blog

Related posts

Read more »