tokens are now more expensive than juniors, and less predictable
Source: Dev.to
Public Pricing (as of 2024)
| Provider / Model | Input price (per 1 M tokens) | Output price (per 1 M tokens) |
|---|---|---|
| OpenAI GPT‑5.4 | $2.50 | $15 |
| Anthropic Claude Sonnet 4.6 | $3.00 | $15 |
| Google Gemini 2.5 Pro | $1.25 (≤ 200 k tokens) $2.50 (> 200 k tokens) | $10 (≤ 200 k tokens) $15 (> 200 k tokens) |
These numbers look cheap if you only run a few prompts in a playground.
A Rough Cost Sketch for a 10‑Person Team
Assumptions per seat per workday
- 5 M input tokens
- 2 M output tokens
22 workdays ≈ 1 month.
| Provider / Model | Approx. monthly cost for 10 seats |
|---|---|
| OpenAI GPT‑5.4 | $9,350 |
| Anthropic Claude Sonnet 4.6 | $9,900 |
| Google Gemini 2.5 Pro | $7,150 – $9,350 (range reflects the two‑tier pricing) |
The Gemini range shows how a single model can swing between “cheap” and “expensive” depending on token usage patterns.
How This Stacks Up Against Salaries
| Role (2024 median) | Annual salary | Monthly salary |
|---|---|---|
| Administrative assistant (secretary) | $47,460 | $3,955 |
| Software developer (median) | $133,080 | $11,090 |
| Software developer (10th percentile) | $79,850 | $6,654 |
Takeaway: A single engineer casually using a model is still cheaper than a junior developer, but company‑wide AI workflows can outpace junior labor costs very quickly. Five heavy AI seats can already exceed the monthly cost of a median administrative assistant.
Why Token Spend Can Be a Hidden Cost
-
Output is often the expensive half – OpenAI GPT‑5.4 charges 6× more for output than input. Teams that focus only on “sending a lot of context” miss the bulk of the bill.
-
Tokenizer changes matter – Anthropic notes that Claude Opus 4.7’s new tokenizer can consume up to 35 % more tokens for the same text, causing a sudden cost jump with no workload change.
-
Tiered pricing creates surprise – Gemini 2.5 Pro switches rates after 200 k tokens. Longer prompts, lower cache hit rates, or added features (e.g., grounding, search) can dramatically reshape the bill.
-
Agents multiply the line‑items – With an AI agent you pay for:
- The original prompt
- Tool schemas & results
- Chain‑of‑thought reasoning budgets (platform‑dependent)
- Retries, file context, prior‑turn summaries, review passes, self‑correction loops, etc.
“The agent did the task in 8 minutes” often hides a blurrier marginal cost than the dashboard suggests.
Recommendations (Do the Boring Things Early)
| Action | Why |
|---|---|
| Don’t benchmark on a single cute demo | One‑off tests hide long‑term cost patterns. |
| Match model capability to task | Not every task needs the frontier model. |
| Avoid using expensive models as a management substitute | Human oversight still adds value. |
| Tag and monitor token usage | Treat token spend like any other budget line‑item. |
| Apply the “AI vs human” framing carefully | Leads to better architecture and honest economics. |
| Use AI to amplify good people, not replace them | Humans remain owners of correctness, cost, and consequences. |
Bottom Line
- Tokens are still useful, but they are no longer a cute rounding error.
- For many teams, token spend is becoming a real labor‑adjacent budget category.
- Don’t pretend tokens are magically cheaper than people – they come with a billing model that can change under your feet, a cost profile that can explode with usage patterns, and a nasty habit of looking cheap right until they aren’t.
My default stance:
Use AI aggressively, but never let the token budget operate without adult supervision.
References
- OpenAI, API Pricing – https://openai.com/pricing
- Anthropic, Claude Pricing – https://www.anthropic.com/pricing
- Google, Gemini Developer API Pricing – https://cloud.google.com/vertex-ai/generative-ai/pricing
- U.S. Bureau of Labor Statistics, Software Developers, Quality Assurance Analysts, and Testers – https://www.bls.gov/oes/current/oes151132.htm
- U.S. Bureau of Labor Statistics, Secretaries and Administrative Assistants – https://www.bls.gov/oes/current/oes43-3000.htm