LLM Pricing in February 2026: What Every Model Actually Costs

Published: 3 days ago (February 19, 2026 at 07:35 AM EST)

4 min read

Source: Dev.to

TL;DR

Cheapest option: OpenAI’s open‑source GPT‑OSS‑20B at $0.05 /M input.
Best price/performance: GPT‑5 mini at $0.25 /M input.
Most expensive: Grok‑4 at $30 /M input (≈ 600× the cost of GPT‑OSS‑20B).

Pricing Table (All prices per million tokens)

Model	Provider	Input	Output	Notes
GPT‑5.2	OpenAI	$1.75	$14.00	Flagship, best overall quality
GPT‑5 mini	OpenAI	$0.25	$2.00	Best price/performance ratio
GPT‑4.1	OpenAI	$2.00	$8.00	Still widely deployed
GPT‑4.1 nano	OpenAI	$0.10	$0.40	Cheapest OpenAI option
o4‑mini	OpenAI	$1.10	$4.40	Reasoning model
Claude Opus 4.6	Anthropic	$5.00	$25.00	Top‑tier reasoning + coding
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	Workhorse model
Claude Haiku 4.5	Anthropic	$1.00	$5.00	Fast + cheap
GPT‑OSS‑120B	OpenAI (open‑source)	$0.15	$0.60	Open‑weight, via hosted APIs
GPT‑OSS‑20B	OpenAI (open‑source)	$0.05	$0.20	Smallest open‑weight option
Gemini 2.5 Flash	Google	$0.30	$2.50	Strong on long context
Gemini 2.0 Flash	Google	$0.10	$0.40	Budget tier
Llama 4 Maverick	Meta (via API)	$0.27	$0.85	Open‑weight, self‑hostable
DeepSeek V3.1	DeepSeek	$0.60	$1.70	Chinese lab, surprisingly strong
Grok‑4	xAI	$30.00	$150.00	Most expensive model on market
Grok‑4‑fast	xAI	$2.00	$5.00	xAI’s mid‑tier
Grok‑3	xAI	$30.00	$150.00	Previous gen, same price as Grok‑4
Grok‑3‑mini	xAI	$3.00	$5.00	Budget reasoning

Sources: OpenAI pricing, Anthropic models, Google AI pricing, xAI pricing, DeepSeek pricing, Together.ai, Groq for open‑source model hosting (checked Feb 19 2026).

Key Takeaways

Input vs. output cost: Output tokens cost 3–8× more than input across all providers. For apps that generate long responses, output dominates the bill.
Caching: OpenAI and Anthropic offer prompt‑caching that can cut repeat‑context costs by 50–90 %.
Quality gap shrinking: GPT‑5 mini, Claude Sonnet 4, and Gemini 2.5 Flash now compete closely for most tasks. Premium models (GPT‑5.2, Opus 4) still lead on complex reasoning and long‑form analysis.
Latency matters: A cheaper model with high latency can cost more in user drop‑off than a model that’s 2× pricier but responds faster. Benchmark latency alongside cost.
Self‑hosting advantage: Open‑weight models (e.g., Llama 4 Maverick) can drop effective input cost below $0.10 /M when self‑hosted on GPUs, making them attractive for >10 B tokens/month workloads.

Recommendations by Use‑Case

Use‑case	Recommended Model(s)	Reason
High‑volume production (chatbots, classification, extraction)	GPT‑5 mini or Gemini 2.0 Flash	< $0.50 /M input, solid quality
Code generation	Claude Sonnet 4 or GPT‑5.2	Sonnet excels at complex coding instructions; GPT‑5.2 handles multi‑file refactoring
Research & analysis	Claude Opus 4.6 (budget permitting) or GPT‑5.2	Opus 4.6 offers top‑tier reasoning; GPT‑5.2 is a strong alternative
Cost‑sensitive startups	Llama 4 Maverick (self‑hosted) or GPT‑4.1 nano (API)	Lowest entry cost while maintaining acceptable quality
Budget‑first experimentation	GPT‑OSS‑20B	Cheapest open‑weight option at $0.05 /M input

Outlook

Pricing trend: Equivalent‑quality pricing has dropped roughly 10× per year over the past three years. Expect GPT‑5 mini‑level quality at ≤ $0.05 /M input by Q4 2026.
Infrastructure shift: Custom silicon (Google TPUs, Amazon Trainium, Microsoft Maia) is beginning to undercut Nvidia GPU economics. As these scale, hosted API pricing may fall faster than self‑hosting costs, potentially flipping the build‑vs‑buy decision for mid‑size companies.

LLM Pricing in February 2026: What Every Model Actually Costs

TL;DR

Pricing Table (All prices per million tokens)

Key Takeaways

Recommendations by Use‑Case

Outlook

Related posts

When One Model Reviews Its Own Work: The Case for Adversarial Cross-Model Review

AI energy efficiency comparisons ‘unfair’ bleats Sam Altman, citing amount of energy needed to evolve, then train a human — one ‘takes like 20 years of life and all of the food you eat during that time before you get smart’ he argues

The Road to Agent Autonomy: Challenges, Discoveries, and a 28-Line Solution

Cord: Coordinating Trees of AI Agents

TL;DR

Pricing Table (All prices per million tokens)

Key Takeaways

Recommendations by Use‑Case

Outlook

Related posts

When One Model Reviews Its Own Work: The Case for Adversarial Cross-Model Review

AI energy efficiency comparisons ‘unfair’ bleats Sam Altman, citing amount of energy needed to evolve, then train a human — one ‘takes like 20 years of life and all of the food you eat during that time before you get smart’ he argues

The Road to Agent Autonomy: Challenges, Discoveries, and a 28-Line Solution

Cord: Coordinating Trees of AI Agents

Pricing Table (All prices per million tokens)