LLM Pricing in February 2026: What Every Model Actually Costs
Source: Dev.to
TL;DR
Cheapest option: OpenAI’s open‑source GPT‑OSS‑20B at $0.05 /M input.
Best price/performance: GPT‑5 mini at $0.25 /M input.
Most expensive: Grok‑4 at $30 /M input (≈ 600× the cost of GPT‑OSS‑20B).
Pricing Table (All prices per million tokens)
| Model | Provider | Input | Output | Notes |
|---|---|---|---|---|
| GPT‑5.2 | OpenAI | $1.75 | $14.00 | Flagship, best overall quality |
| GPT‑5 mini | OpenAI | $0.25 | $2.00 | Best price/performance ratio |
| GPT‑4.1 | OpenAI | $2.00 | $8.00 | Still widely deployed |
| GPT‑4.1 nano | OpenAI | $0.10 | $0.40 | Cheapest OpenAI option |
| o4‑mini | OpenAI | $1.10 | $4.40 | Reasoning model |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | Top‑tier reasoning + coding |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | Workhorse model |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | Fast + cheap |
| GPT‑OSS‑120B | OpenAI (open‑source) | $0.15 | $0.60 | Open‑weight, via hosted APIs |
| GPT‑OSS‑20B | OpenAI (open‑source) | $0.05 | $0.20 | Smallest open‑weight option |
| Gemini 2.5 Flash | $0.30 | $2.50 | Strong on long context | |
| Gemini 2.0 Flash | $0.10 | $0.40 | Budget tier | |
| Llama 4 Maverick | Meta (via API) | $0.27 | $0.85 | Open‑weight, self‑hostable |
| DeepSeek V3.1 | DeepSeek | $0.60 | $1.70 | Chinese lab, surprisingly strong |
| Grok‑4 | xAI | $30.00 | $150.00 | Most expensive model on market |
| Grok‑4‑fast | xAI | $2.00 | $5.00 | xAI’s mid‑tier |
| Grok‑3 | xAI | $30.00 | $150.00 | Previous gen, same price as Grok‑4 |
| Grok‑3‑mini | xAI | $3.00 | $5.00 | Budget reasoning |
Sources: OpenAI pricing, Anthropic models, Google AI pricing, xAI pricing, DeepSeek pricing, Together.ai, Groq for open‑source model hosting (checked Feb 19 2026).
Key Takeaways
- Input vs. output cost: Output tokens cost 3–8× more than input across all providers. For apps that generate long responses, output dominates the bill.
- Caching: OpenAI and Anthropic offer prompt‑caching that can cut repeat‑context costs by 50–90 %.
- Quality gap shrinking: GPT‑5 mini, Claude Sonnet 4, and Gemini 2.5 Flash now compete closely for most tasks. Premium models (GPT‑5.2, Opus 4) still lead on complex reasoning and long‑form analysis.
- Latency matters: A cheaper model with high latency can cost more in user drop‑off than a model that’s 2× pricier but responds faster. Benchmark latency alongside cost.
- Self‑hosting advantage: Open‑weight models (e.g., Llama 4 Maverick) can drop effective input cost below $0.10 /M when self‑hosted on GPUs, making them attractive for >10 B tokens/month workloads.
Recommendations by Use‑Case
| Use‑case | Recommended Model(s) | Reason |
|---|---|---|
| High‑volume production (chatbots, classification, extraction) | GPT‑5 mini or Gemini 2.0 Flash | < $0.50 /M input, solid quality |
| Code generation | Claude Sonnet 4 or GPT‑5.2 | Sonnet excels at complex coding instructions; GPT‑5.2 handles multi‑file refactoring |
| Research & analysis | Claude Opus 4.6 (budget permitting) or GPT‑5.2 | Opus 4.6 offers top‑tier reasoning; GPT‑5.2 is a strong alternative |
| Cost‑sensitive startups | Llama 4 Maverick (self‑hosted) or GPT‑4.1 nano (API) | Lowest entry cost while maintaining acceptable quality |
| Budget‑first experimentation | GPT‑OSS‑20B | Cheapest open‑weight option at $0.05 /M input |
Outlook
- Pricing trend: Equivalent‑quality pricing has dropped roughly 10× per year over the past three years. Expect GPT‑5 mini‑level quality at ≤ $0.05 /M input by Q4 2026.
- Infrastructure shift: Custom silicon (Google TPUs, Amazon Trainium, Microsoft Maia) is beginning to undercut Nvidia GPU economics. As these scale, hosted API pricing may fall faster than self‑hosting costs, potentially flipping the build‑vs‑buy decision for mid‑size companies.