Open Source vs Proprietary LLMs: The Real Cost Breakdown

Published: 3 days ago (February 19, 2026 at 07:35 AM EST)

4 min read

Source: Dev.to

TL;DR

Below 1 B tokens/month – just use proprietary APIs.
1 – 10 B tokens/month – hosted open‑source APIs (e.g., Together.ai, Groq) are usually the cheapest.
Above 10 B tokens/month – self‑hosting can win, but only if you already have an MLOps team.

The “open source is free” narrative ignores $300 K – $600 K / year in engineering overhead.

Prices move fast. The numbers below are current as of February 2026 and are quoted per 1 M tokens (input / output).

Hosted‑API Pricing (per 1 M tokens)

Model	Provider	Input	Output	Notes
Llama 4 Maverick	Together.ai	$0.27	$0.85
Llama 4 Maverick	Groq	$0.20	$0.60	562 tok/s
GPT‑OSS‑120B	Together.ai / Fireworks / Groq	$0.15	$0.60
GPT‑OSS‑20B	Together.ai	$0.05	$0.20	“Bargain tier”
DeepSeek V3.1	Together.ai	$0.60	$1.70
Qwen3‑235B	Together.ai	$0.20	$0.60
Mistral Small 3	Together.ai	$0.10	$0.30

Proprietary‑API Pricing (per 1 M tokens)

Model	Input	Output	Source
GPT‑5.2	$1.75	$14.00	OpenAI
GPT‑5 mini	$0.25	$2.00	OpenAI
Claude Opus 4.6	$5.00	$25.00	Anthropic
Claude Sonnet 4.6	$3.00	$15.00	Anthropic
Gemini 2.5 Flash	$0.30	$2.5	Google

Quick observations

GPT‑OSS‑120B at $0.15 input is ≈ 11× cheaper than GPT‑5.2 on the input side.
GPT‑5 mini and Gemini 2.5 Flash sit in a middle ground where proprietary pricing gets surprisingly close to open‑source hosted rates.

For a deeper dive on month‑over‑month trends, see the full pricing comparison (link in the original article).

The real decision space

Option	Description
1️⃣ Proprietary API	Pay OpenAI, Anthropic, or Google directly.
2️⃣ Hosted open‑source API	Pay Together.ai, Groq, or Fireworks to run open models for you.
3️⃣ Self‑hosted open source	Rent GPUs and run the models yourself.

Option 2 is often overlooked. It gives you the flexibility of open weights without the operational burden—the sweet spot for most companies.

Option 3 looks attractive on paper, but in practice it’s a staffing decision masquerading as a technology decision.

Cost comparison: GPT‑OSS‑120B (Together.ai) vs. self‑hosting

Assumptions

Hosted price: $0.15 / $0.60 (input / output) via Together.ai.
Self‑hosted hardware: Lambda Labs H100 at $2.99 / hr (≈ $2,183 / mo).
A single H100 running a 70 B model ≈ 50 tokens / s → ≈ 130 M tokens / mo.

Scale (tokens/mo)	Together.ai cost	Self‑hosted cost*	Winner
10 M	~ $4.50	$2,183 + engineering overhead	API (by a mile)
100 M	~ $45	$2,183 + engineering overhead	API
1 B	~ $450	$2,183 + engineering overhead	Roughly even on compute, but API wins on total cost
10 B	~ $4,500	~ $17 K compute (8 × H100) + engineering overhead	Depends on your team

*Compute‑only crossover is around 1 – 2 B tokens/month; engineering overhead pushes the break‑even point higher.

Cloud‑GPU pricing impact

Provider	Instance	Hourly cost	Notes
AWS	H100 (on‑demand)	~$3.90 / hr	Higher than Lambda Labs
AWS	H100 (reserved)	$1.85 / hr	Requires 1‑year commitment
Fireworks	H200	$6.00 / hr	More throughput per dollar
Fireworks	B200	$9.00 / hr	Even more throughput, higher cost

Even with reserved instances, the economics still favor APIs for most workloads.

Hidden costs of self‑hosting

MLOps team: $300 K – $600 K / yr (2 – 4 engineers).
Operational overhead: monitoring, alerting, model versioning, rollback procedures, GPU utilization tuning (30 % – 50 % waste), security patching, compliance audits, on‑call rotations.
Upgrade treadmill: new model releases → re‑run evaluation, re‑tune, redeploy. With an API you merely change the model string.

These costs never appear in a simple $/token calculation but are real budget items.

When self‑hosting makes sense

Compliance & data sovereignty – Healthcare, finance, or any regulated industry that requires data to stay on‑premises (HIPAA, GDPR). No BAA negotiations, no reliance on a provider’s compliance claims.
Air‑gapped environments – Defense, certain government agencies, and some financial institutions that cannot send data to external APIs.
Fine‑tuning at scale
- OpenAI’s GPT‑4.1 fine‑tuning: $25 / M tokens.
- Open

Open Source vs Proprietary LLMs: The Real Cost Breakdown

TL;DR

Hosted‑API Pricing (per 1 M tokens)

Proprietary‑API Pricing (per 1 M tokens)

Quick observations

The real decision space

Cost comparison: GPT‑OSS‑120B (Together.ai) vs. self‑hosting

Cloud‑GPU pricing impact

Hidden costs of self‑hosting

When self‑hosting makes sense

Related posts

Apex B. OpenClaw, Local Embeddings.

Apex 1. OpenClaw, Historial de Providers.

I Built the Open Source “Microsoft Edge Drop” Replacement Using Cloudflare R2 + Turso

Did some actual coding today - found a blind spot example for coding agents

TL;DR

Hosted‑API Pricing (per 1 M tokens)

Proprietary‑API Pricing (per 1 M tokens)

Quick observations

The real decision space

Cost comparison: GPT‑OSS‑120B (Together.ai) vs. self‑hosting

Cloud‑GPU pricing impact

Hidden costs of self‑hosting

When self‑hosting makes sense

Related posts

Apex B. OpenClaw, Local Embeddings.

Apex 1. OpenClaw, Historial de Providers.

I Built the Open Source “Microsoft Edge Drop” Replacement Using Cloudflare R2 + Turso

Did some actual coding today - found a blind spot example for coding agents

Hosted‑API Pricing (per 1 M tokens)

Proprietary‑API Pricing (per 1 M tokens)