Open Source vs Proprietary LLMs: The Real Cost Breakdown

Published: (February 19, 2026 at 07:35 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

TL;DR

  • Below 1 B tokens/month – just use proprietary APIs.
  • 1 – 10 B tokens/month – hosted open‑source APIs (e.g., Together.ai, Groq) are usually the cheapest.
  • Above 10 B tokens/month – self‑hosting can win, but only if you already have an MLOps team.

The “open source is free” narrative ignores $300 K – $600 K / year in engineering overhead.

Prices move fast. The numbers below are current as of February 2026 and are quoted per 1 M tokens (input / output).

Hosted‑API Pricing (per 1 M tokens)

ModelProviderInputOutputNotes
Llama 4 MaverickTogether.ai$0.27$0.85
Llama 4 MaverickGroq$0.20$0.60562 tok/s
GPT‑OSS‑120BTogether.ai / Fireworks / Groq$0.15$0.60
GPT‑OSS‑20BTogether.ai$0.05$0.20“Bargain tier”
DeepSeek V3.1Together.ai$0.60$1.70
Qwen3‑235BTogether.ai$0.20$0.60
Mistral Small 3Together.ai$0.10$0.30

Proprietary‑API Pricing (per 1 M tokens)

ModelInputOutputSource
GPT‑5.2$1.75$14.00OpenAI
GPT‑5 mini$0.25$2.00OpenAI
Claude Opus 4.6$5.00$25.00Anthropic
Claude Sonnet 4.6$3.00$15.00Anthropic
Gemini 2.5 Flash$0.30$2.5Google

Quick observations

  • GPT‑OSS‑120B at $0.15 input is ≈ 11× cheaper than GPT‑5.2 on the input side.
  • GPT‑5 mini and Gemini 2.5 Flash sit in a middle ground where proprietary pricing gets surprisingly close to open‑source hosted rates.

For a deeper dive on month‑over‑month trends, see the full pricing comparison (link in the original article).

The real decision space

OptionDescription
1️⃣ Proprietary APIPay OpenAI, Anthropic, or Google directly.
2️⃣ Hosted open‑source APIPay Together.ai, Groq, or Fireworks to run open models for you.
3️⃣ Self‑hosted open sourceRent GPUs and run the models yourself.

Option 2 is often overlooked. It gives you the flexibility of open weights without the operational burden—​the sweet spot for most companies.

Option 3 looks attractive on paper, but in practice it’s a staffing decision masquerading as a technology decision.

Cost comparison: GPT‑OSS‑120B (Together.ai) vs. self‑hosting

Assumptions

  • Hosted price: $0.15 / $0.60 (input / output) via Together.ai.
  • Self‑hosted hardware: Lambda Labs H100 at $2.99 / hr (≈ $2,183 / mo).
  • A single H100 running a 70 B model ≈ 50 tokens / s → ≈ 130 M tokens / mo.
Scale (tokens/mo)Together.ai costSelf‑hosted cost*Winner
10 M~ $4.50$2,183 + engineering overheadAPI (by a mile)
100 M~ $45$2,183 + engineering overheadAPI
1 B~ $450$2,183 + engineering overheadRoughly even on compute, but API wins on total cost
10 B~ $4,500~ $17 K compute (8 × H100) + engineering overheadDepends on your team

*Compute‑only crossover is around 1 – 2 B tokens/month; engineering overhead pushes the break‑even point higher.

Cloud‑GPU pricing impact

ProviderInstanceHourly costNotes
AWSH100 (on‑demand)~$3.90 / hrHigher than Lambda Labs
AWSH100 (reserved)$1.85 / hrRequires 1‑year commitment
FireworksH200$6.00 / hrMore throughput per dollar
FireworksB200$9.00 / hrEven more throughput, higher cost

Even with reserved instances, the economics still favor APIs for most workloads.

Hidden costs of self‑hosting

  • MLOps team: $300 K – $600 K / yr (2 – 4 engineers).
  • Operational overhead: monitoring, alerting, model versioning, rollback procedures, GPU utilization tuning (30 % – 50 % waste), security patching, compliance audits, on‑call rotations.
  • Upgrade treadmill: new model releases → re‑run evaluation, re‑tune, redeploy. With an API you merely change the model string.

These costs never appear in a simple $/token calculation but are real budget items.

When self‑hosting makes sense

  1. Compliance & data sovereignty – Healthcare, finance, or any regulated industry that requires data to stay on‑premises (HIPAA, GDPR). No BAA negotiations, no reliance on a provider’s compliance claims.
  2. Air‑gapped environments – Defense, certain government agencies, and some financial institutions that cannot send data to external APIs.
  3. Fine‑tuning at scale
    • OpenAI’s GPT‑4.1 fine‑tuning: $25 / M tokens.
    • Open
0 views
Back to Blog

Related posts

Read more »

Apex B. OpenClaw, Local Embeddings.

Local Embeddings para Private Memory Search Por default, el memory search de OpenClaw envía texto a un embedding API externo típicamente Anthropic u OpenAI par...