Add Observability, Routing, and Failover to Your LLM Stack With One URL Change

Published: (December 20, 2025 at 09:02 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Bifrost is a high‑performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, …) through a single OpenAI‑compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise‑grade observability.

Why Bifrost?

If your LLM application already works, you shouldn’t have to refactor it just to add:

  • Observability
  • Load balancing
  • Caching
  • Provider failover

Most solutions force you to:

  • Rewrite API calls
  • Learn a new SDK
  • Refactor stable code
  • Re‑test everything (risky & expensive)

Bifrost avoids all that. Drop it in, change one URL, and you’re done.

Quick Start

Go from zero to a production‑ready AI gateway in under a minute.

StepCommand
1️⃣ Start Bifrost GatewayInstall & run locally
bash\nnpx -y @maximhq/bifrost\n
Or use Docker
bash\ndocker run -p 8080:8080 maximhq/bifrost\n
2️⃣ Configure via Web UIOpen the built‑in interface:
bash\nopen http://localhost:8080\n
3️⃣ Make your first API callbash\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"model\": \"openai/gpt-4o-mini\",\n \"messages\": [{\"role\": \"user\", \"content\": \"Hello, Bifrost!\"}]\n }'\n

That’s it – your AI gateway is running with a web UI for visual configuration and real‑time monitoring.

View on GitHub

OpenAI‑Compatible – One‑Line Change

If your code already works with OpenAI, it works with Bifrost.

import openai

# Original OpenAI usage
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# 👉 Switch to Bifrost – only the base URL changes
openai.api_base = "http://localhost:8080/openai"
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Everything else stays the same. Because Bifrost is OpenAI‑compatible, it works with any framework that already supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)

Pattern: update the base URL, keep the rest of your code unchanged.

Multi‑Provider Routing

Define providers in a single JSON/YAML config. Bifrost routes requests to the appropriate backend based on the model name.

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}
# Routes to OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# Routes to Anthropic
response = client.chat_completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[...]
)

Switch providers simply by changing the model name – no refactoring required.

Built‑In Observability

Plugins (e.g., Maxim)

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}

Every request is automatically traced in the Maxim dashboard – no instrumentation code needed.

Metrics (Prometheus)

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}

Metrics are exposed at /metrics and can be scraped by Prometheus.

OpenTelemetry (OTLP)

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}

Standard OTLP export to any OpenTelemetry‑compatible collector.

Provider‑Specific Example (Claude)

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}

All Claude requests now flow through Bifrost, enabling cost tracking, token usage, and caching.

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]

Model Context Protocol (MCP) – Tool Calling & Shared Context

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}

Once configured, your LLM calls automatically gain access to MCP tools.

Deployment Examples

Docker (quick test)

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest

Docker‑Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bifrost
  template:
    metadata:
      labels:
        app: bifrost
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080

Terraform

See the official documentation for full examples.

Summary

FeatureStatus
Observability✅ (auto‑traced, metrics, OTLP)
Semantic Caching
Multi‑Key Load Balancing
Provider Failover
MCP Tool Calling
One‑Line Integration

Migration steps (≈10 minutes):

  1. Run Bifrost.
  2. Add provider API keys (via UI or config).
  3. Update the base URL in your code.
  4. Test a single request.
  5. Deploy (Docker, K8s, etc.).

All features are enabled automatically – no code changes beyond the URL. Enjoy a resilient, observable, and scalable LLM stack with zero refactoring.

Quick Integration Checklist

  • OpenAI‑compatible API
  • One URL change
  • Multi‑provider routing
  • Built‑in observability
  • No refactoring required
  • No new SDKs
  • No code rewrites

Just drop it in.

Built by the team at Maxim AI.

Back to Blog

Related posts

Read more »