LLM Pricing in February 2026: What Every Model Actually Costs

Published: (February 19, 2026 at 07:35 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

TL;DR

Cheapest option: OpenAI’s open‑source GPT‑OSS‑20B at $0.05 /M input.
Best price/performance: GPT‑5 mini at $0.25 /M input.
Most expensive: Grok‑4 at $30 /M input (≈ 600× the cost of GPT‑OSS‑20B).


Pricing Table (All prices per million tokens)

ModelProviderInputOutputNotes
GPT‑5.2OpenAI$1.75$14.00Flagship, best overall quality
GPT‑5 miniOpenAI$0.25$2.00Best price/performance ratio
GPT‑4.1OpenAI$2.00$8.00Still widely deployed
GPT‑4.1 nanoOpenAI$0.10$0.40Cheapest OpenAI option
o4‑miniOpenAI$1.10$4.40Reasoning model
Claude Opus 4.6Anthropic$5.00$25.00Top‑tier reasoning + coding
Claude Sonnet 4.6Anthropic$3.00$15.00Workhorse model
Claude Haiku 4.5Anthropic$1.00$5.00Fast + cheap
GPT‑OSS‑120BOpenAI (open‑source)$0.15$0.60Open‑weight, via hosted APIs
GPT‑OSS‑20BOpenAI (open‑source)$0.05$0.20Smallest open‑weight option
Gemini 2.5 FlashGoogle$0.30$2.50Strong on long context
Gemini 2.0 FlashGoogle$0.10$0.40Budget tier
Llama 4 MaverickMeta (via API)$0.27$0.85Open‑weight, self‑hostable
DeepSeek V3.1DeepSeek$0.60$1.70Chinese lab, surprisingly strong
Grok‑4xAI$30.00$150.00Most expensive model on market
Grok‑4‑fastxAI$2.00$5.00xAI’s mid‑tier
Grok‑3xAI$30.00$150.00Previous gen, same price as Grok‑4
Grok‑3‑minixAI$3.00$5.00Budget reasoning

Sources: OpenAI pricing, Anthropic models, Google AI pricing, xAI pricing, DeepSeek pricing, Together.ai, Groq for open‑source model hosting (checked Feb 19 2026).


Key Takeaways

  • Input vs. output cost: Output tokens cost 3–8× more than input across all providers. For apps that generate long responses, output dominates the bill.
  • Caching: OpenAI and Anthropic offer prompt‑caching that can cut repeat‑context costs by 50–90 %.
  • Quality gap shrinking: GPT‑5 mini, Claude Sonnet 4, and Gemini 2.5 Flash now compete closely for most tasks. Premium models (GPT‑5.2, Opus 4) still lead on complex reasoning and long‑form analysis.
  • Latency matters: A cheaper model with high latency can cost more in user drop‑off than a model that’s 2× pricier but responds faster. Benchmark latency alongside cost.
  • Self‑hosting advantage: Open‑weight models (e.g., Llama 4 Maverick) can drop effective input cost below $0.10 /M when self‑hosted on GPUs, making them attractive for >10 B tokens/month workloads.

Recommendations by Use‑Case

Use‑caseRecommended Model(s)Reason
High‑volume production (chatbots, classification, extraction)GPT‑5 mini or Gemini 2.0 Flash< $0.50 /M input, solid quality
Code generationClaude Sonnet 4 or GPT‑5.2Sonnet excels at complex coding instructions; GPT‑5.2 handles multi‑file refactoring
Research & analysisClaude Opus 4.6 (budget permitting) or GPT‑5.2Opus 4.6 offers top‑tier reasoning; GPT‑5.2 is a strong alternative
Cost‑sensitive startupsLlama 4 Maverick (self‑hosted) or GPT‑4.1 nano (API)Lowest entry cost while maintaining acceptable quality
Budget‑first experimentationGPT‑OSS‑20BCheapest open‑weight option at $0.05 /M input

Outlook

  • Pricing trend: Equivalent‑quality pricing has dropped roughly 10× per year over the past three years. Expect GPT‑5 mini‑level quality at ≤ $0.05 /M input by Q4 2026.
  • Infrastructure shift: Custom silicon (Google TPUs, Amazon Trainium, Microsoft Maia) is beginning to undercut Nvidia GPU economics. As these scale, hosted API pricing may fall faster than self‑hosting costs, potentially flipping the build‑vs‑buy decision for mid‑size companies.

0 views
Back to Blog

Related posts

Read more »

Cord: Coordinating Trees of AI Agents

The Challenge of Multi‑Agent Coordination AI agents excel at single‑task execution: give Claude a focused instruction, and it delivers. However, real‑world wor...