Open Source vs Proprietary LLMs: The Real Cost Breakdown
Source: Dev.to
TL;DR
- Below 1 B tokens/month – just use proprietary APIs.
- 1 – 10 B tokens/month – hosted open‑source APIs (e.g., Together.ai, Groq) are usually the cheapest.
- Above 10 B tokens/month – self‑hosting can win, but only if you already have an MLOps team.
The “open source is free” narrative ignores $300 K – $600 K / year in engineering overhead.
Prices move fast. The numbers below are current as of February 2026 and are quoted per 1 M tokens (input / output).
Hosted‑API Pricing (per 1 M tokens)
| Model | Provider | Input | Output | Notes |
|---|---|---|---|---|
| Llama 4 Maverick | Together.ai | $0.27 | $0.85 | |
| Llama 4 Maverick | Groq | $0.20 | $0.60 | 562 tok/s |
| GPT‑OSS‑120B | Together.ai / Fireworks / Groq | $0.15 | $0.60 | |
| GPT‑OSS‑20B | Together.ai | $0.05 | $0.20 | “Bargain tier” |
| DeepSeek V3.1 | Together.ai | $0.60 | $1.70 | |
| Qwen3‑235B | Together.ai | $0.20 | $0.60 | |
| Mistral Small 3 | Together.ai | $0.10 | $0.30 |
Proprietary‑API Pricing (per 1 M tokens)
| Model | Input | Output | Source |
|---|---|---|---|
| GPT‑5.2 | $1.75 | $14.00 | OpenAI |
| GPT‑5 mini | $0.25 | $2.00 | OpenAI |
| Claude Opus 4.6 | $5.00 | $25.00 | Anthropic |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Anthropic |
| Gemini 2.5 Flash | $0.30 | $2.5 |
Quick observations
- GPT‑OSS‑120B at $0.15 input is ≈ 11× cheaper than GPT‑5.2 on the input side.
- GPT‑5 mini and Gemini 2.5 Flash sit in a middle ground where proprietary pricing gets surprisingly close to open‑source hosted rates.
For a deeper dive on month‑over‑month trends, see the full pricing comparison (link in the original article).
The real decision space
| Option | Description |
|---|---|
| 1️⃣ Proprietary API | Pay OpenAI, Anthropic, or Google directly. |
| 2️⃣ Hosted open‑source API | Pay Together.ai, Groq, or Fireworks to run open models for you. |
| 3️⃣ Self‑hosted open source | Rent GPUs and run the models yourself. |
Option 2 is often overlooked. It gives you the flexibility of open weights without the operational burden—the sweet spot for most companies.
Option 3 looks attractive on paper, but in practice it’s a staffing decision masquerading as a technology decision.
Cost comparison: GPT‑OSS‑120B (Together.ai) vs. self‑hosting
Assumptions
- Hosted price: $0.15 / $0.60 (input / output) via Together.ai.
- Self‑hosted hardware: Lambda Labs H100 at $2.99 / hr (≈ $2,183 / mo).
- A single H100 running a 70 B model ≈ 50 tokens / s → ≈ 130 M tokens / mo.
| Scale (tokens/mo) | Together.ai cost | Self‑hosted cost* | Winner |
|---|---|---|---|
| 10 M | ~ $4.50 | $2,183 + engineering overhead | API (by a mile) |
| 100 M | ~ $45 | $2,183 + engineering overhead | API |
| 1 B | ~ $450 | $2,183 + engineering overhead | Roughly even on compute, but API wins on total cost |
| 10 B | ~ $4,500 | ~ $17 K compute (8 × H100) + engineering overhead | Depends on your team |
*Compute‑only crossover is around 1 – 2 B tokens/month; engineering overhead pushes the break‑even point higher.
Cloud‑GPU pricing impact
| Provider | Instance | Hourly cost | Notes |
|---|---|---|---|
| AWS | H100 (on‑demand) | ~$3.90 / hr | Higher than Lambda Labs |
| AWS | H100 (reserved) | $1.85 / hr | Requires 1‑year commitment |
| Fireworks | H200 | $6.00 / hr | More throughput per dollar |
| Fireworks | B200 | $9.00 / hr | Even more throughput, higher cost |
Even with reserved instances, the economics still favor APIs for most workloads.
Hidden costs of self‑hosting
- MLOps team: $300 K – $600 K / yr (2 – 4 engineers).
- Operational overhead: monitoring, alerting, model versioning, rollback procedures, GPU utilization tuning (30 % – 50 % waste), security patching, compliance audits, on‑call rotations.
- Upgrade treadmill: new model releases → re‑run evaluation, re‑tune, redeploy. With an API you merely change the model string.
These costs never appear in a simple $/token calculation but are real budget items.
When self‑hosting makes sense
- Compliance & data sovereignty – Healthcare, finance, or any regulated industry that requires data to stay on‑premises (HIPAA, GDPR). No BAA negotiations, no reliance on a provider’s compliance claims.
- Air‑gapped environments – Defense, certain government agencies, and some financial institutions that cannot send data to external APIs.
- Fine‑tuning at scale
- OpenAI’s GPT‑4.1 fine‑tuning: $25 / M tokens.
- Open