LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

Published: 3 weeks ago (May 18, 2026 at 11:56 PM EDT)

3 min read

Source: Hacker News

Hard dollar caps

On every LLM call. When you hit $50, it stops. Not an alert — it stops. One line of code change. No surprise bills. Ever.

# your_app.py
from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")

client = Anthropic(
    api_key="sk-ant-...",
    base_url="https://proxy.llmcap.io/anthropic"
)

Works with every major provider

Anthropic
OpenAI
Google Gemini
Mistral
Cohere

Setup in 5 minutes

How LLMCap works

Providers supported: 5
Avg added latency: <35 ms
Requests blocked today: 18,742
Uptime: 99.9%

Available everywhere you code.

Works in your workflow

VS Code Extension

Live spend in your status bar. Click to see today’s usage, burn rate, and blocked count — without leaving the editor.

Install Extension

Terminal CLI (PyPI)

Check spend, browse logs, and manage keys from the command line. Works on macOS, Linux, and Windows.

pip install llmcap

View on PyPI

Windows Tray App (Desktop)

System tray icon shows live spend. Right‑click for stats and quick actions. Always visible, never intrusive.

pip install "llmcap[tray]"

Get Tray App

Pick your plan

3‑day trial, no charge until it ends · Cancel anytime

Starter – $19 /mo (after 3‑day trial)

2 API keys
All 5 providers
Daily & monthly caps
30‑day audit log
1 user
Email support

Start 3‑Day Trial

Credit card required for trial. Cancel before day 3 and you won’t be charged.

Questions

Does LLMCap ever see or store my API keys?
No. Your provider API key (e.g. sk-ant-...) is passed through the proxy header on each request and immediately discarded. LLMCap only stores your LLMCap proxy key, hashed with bcrypt. We never log your provider keys.

Does it work with streaming responses?
Yes — streaming is supported from day one. LLMCap passes SSE chunks through in real time. If the budget is exceeded mid‑stream, the connection is closed and a final 429 event is sent. The token that triggered the cap is not charged.

What exactly happens when the cap is hit?
The next incoming request is rejected with HTTP 429 before it reaches the provider. The token is never consumed, so you are never billed for it. Your app receives the same 429 response structure providers use for rate limiting, so existing error handling works as‑is.

Can I self‑host LLMCap?
Self‑hosting is on the roadmap. The proxy is open source (FastAPI + Redis). For now, the managed service at proxy.llmcap.io is the recommended path — it’s already deployed with <35 ms latency worldwide.

LLMCap – A proxy that hard-stops LLM API calls when you hit a dollar cap

Hard dollar caps

Works with every major provider

Setup in 5 minutes

How LLMCap works

Works in your workflow

VS Code Extension

Terminal CLI (PyPI)

Windows Tray App (Desktop)

Pick your plan

Questions

Related posts

Using AI to write better code more slowly

Taking a walk may lead to more creativity than sitting, study finds (2014)

Microsoft Copilot Cowork Exfiltrates Files

Yoti age checks share facial photos and device fingerprints with third parties