Add Observability, Routing, and Failover to Your LLM Stack With One URL Change

Published: 1 week ago (December 20, 2025 at 09:02 PM EST)

4 min read

Source: Dev.to

Bifrost is a high‑performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, …) through a single OpenAI‑compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise‑grade observability.

Why Bifrost?

If your LLM application already works, you shouldn’t have to refactor it just to add:

Observability
Load balancing
Caching
Provider failover

Most solutions force you to:

Rewrite API calls
Learn a new SDK
Refactor stable code
Re‑test everything (risky & expensive)

Bifrost avoids all that. Drop it in, change one URL, and you’re done.

Quick Start

Go from zero to a production‑ready AI gateway in under a minute.

Step	Command
1️⃣ Start Bifrost Gateway	Install & run locally `bash\nnpx -y @maximhq/bifrost\n` Or use Docker `bash\ndocker run -p 8080:8080 maximhq/bifrost\n`
2️⃣ Configure via Web UI	Open the built‑in interface: `bash\nopen http://localhost:8080\n`
3️⃣ Make your first API call	`bash\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"model\": \"openai/gpt-4o-mini\",\n \"messages\": [{\"role\": \"user\", \"content\": \"Hello, Bifrost!\"}]\n }'\n`

That’s it – your AI gateway is running with a web UI for visual configuration and real‑time monitoring.

View on GitHub

OpenAI‑Compatible – One‑Line Change

If your code already works with OpenAI, it works with Bifrost.

import openai

# Original OpenAI usage
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# 👉 Switch to Bifrost – only the base URL changes
openai.api_base = "http://localhost:8080/openai"
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Everything else stays the same. Because Bifrost is OpenAI‑compatible, it works with any framework that already supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)

Pattern: update the base URL, keep the rest of your code unchanged.

Multi‑Provider Routing

Define providers in a single JSON/YAML config. Bifrost routes requests to the appropriate backend based on the model name.

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}

# Routes to OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# Routes to Anthropic
response = client.chat_completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[...]
)

Switch providers simply by changing the model name – no refactoring required.

Built‑In Observability

Plugins (e.g., Maxim)

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}

Every request is automatically traced in the Maxim dashboard – no instrumentation code needed.

Metrics (Prometheus)

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}

Metrics are exposed at /metrics and can be scraped by Prometheus.

OpenTelemetry (OTLP)

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}

Standard OTLP export to any OpenTelemetry‑compatible collector.

Provider‑Specific Example (Claude)

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}

All Claude requests now flow through Bifrost, enabling cost tracking, token usage, and caching.

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]

Model Context Protocol (MCP) – Tool Calling & Shared Context

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}

Once configured, your LLM calls automatically gain access to MCP tools.

Deployment Examples

Docker (quick test)

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest

Docker‑Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bifrost
  template:
    metadata:
      labels:
        app: bifrost
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080

Terraform

See the official documentation for full examples.

Summary

Feature	Status
Observability	✅ (auto‑traced, metrics, OTLP)
Semantic Caching	✅
Multi‑Key Load Balancing	✅
Provider Failover	✅
MCP Tool Calling	✅
One‑Line Integration	✅

Migration steps (≈10 minutes):

Run Bifrost.
Add provider API keys (via UI or config).
Update the base URL in your code.
Test a single request.
Deploy (Docker, K8s, etc.).

All features are enabled automatically – no code changes beyond the URL. Enjoy a resilient, observable, and scalable LLM stack with zero refactoring.

Quick Integration Checklist

OpenAI‑compatible API
One URL change
Multi‑provider routing
Built‑in observability
No refactoring required
No new SDKs
No code rewrites

Just drop it in.

Built by the team at Maxim AI.