I Spent $12 on 4 AI Questions. Then Linux Foundation Made MCP Official.

Published: 3 days ago (December 15, 2025 at 08:10 AM EST)

4 min read

Source: Dev.to

Why I Chose Assistants API (And Why You Probably Did Too)

Let me be honest: Assistants API is genuinely impressive. The developer experience is incredible. Here’s what pulled me in:

The Promise

Built‑in RAG out of the box
Persistent conversation threads
Automatic tool calling
File upload and instant querying
“Just works” in 2 hours

The Appeal

As someone running FPL Hub (2,000+ users, 500 K+ daily API calls), I know the value of managed infrastructure. Assistants API felt like the right abstraction layer. Why manage chunking strategies, vector stores, and context windows when OpenAI handles it all?

I uploaded a PDF, asked my questions, and got accurate responses. The prototype worked beautifully—until I checked my bill.

The Hidden Cost Structure Nobody Warns You About

OpenAI’s pricing page lists:

GPT‑4o: $5 input / $15 output per 1 M tokens
Code Interpreter: $0.03 per session
File Search: $0.10 / GB / day

That looks reasonable, but the actual charges can be surprising.

The Real Math for My “Simple” Query

PDF (10 pages, ~5K tokens)
↓
Vector Store automatic chunking → 50,000 tokens
↓
Retrieval augmentation per query → 20,000 tokens
↓
Context window (conversation history) → 8,000 tokens
↓
Tool call overhead → 3,000 tokens
↓
Your actual query + response → 250 tokens
────────────────────────────────────
Total per question: ~81,000 tokens = $0.81

Four questions broke down like this

Model costs: $3.24 (324 K tokens)
Code Interpreter sessions: $0.06
File Search storage (3 days): $0.30
Hidden retrieval costs: $8.87

Total: $12.47

Why Costs Spiral

Token multiplication you can’t control – Assistants API automatically chunks documents for vector search. A 5 K‑token PDF becomes ~50 K tokens in storage, and each retrieval multiplies that further.
Context window bloat – Every follow‑up question reloads the entire conversation history. Question 1 costs $0.81; by question 4 the cost rises to $3.50 because of accumulated context.
Storage fees compound daily – $0.10 / GB / day adds up quickly:
- 1 GB document ≈ $3 /month
- 10 GB knowledge base ≈ $30 /month
Hidden retrieval costs – The File Search tool not only retrieves chunks; it also augments each query with those chunks, incurring embedding, similarity search, and prompt token costs multiplied by conversation history.

Real‑World Cost Projections

Customer support bot (1 K conversations/day)

5 messages per conversation
2 knowledge‑base documents (≈500 pages)
Storage: $6 /day → $180 /month
Queries: ~300 K tokens/day → $300 /day

Total: ≈ $9 180 /month

Document analysis app

User uploads 5 PDFs (≈250 pages)
10 questions per document, 3 follow‑ups each

Cost per user session: $45
100 users: $4 500 /month

My actual use case

4 test questions, 1 small PDF (10 pages), 2 conversation threads

Cost: $12.47 → projected $3 100 /month at 1 K users.

The MCP Alternative: Same Features, 99 % Cost Reduction

What is MCP?

Model Context Protocol (MCP) is an open standard for connecting AI models to data sources and tools—think USB‑C for AI. As of December 9 2025, it’s an official Linux Foundation project.

Founding members include Anthropic, OpenAI, Google, Microsoft, AWS, Cloudflare, Bloomberg, and Block.

Architecture Comparison

Traditional Assistants API flow

flowchart LR
    A[User] --> B[OpenAI API]
    B --> C[Thread Storage]
    B --> D[Vector Store]
    B --> E[GPT‑4]
    E --> F[Response]
    style C fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#f9f,stroke:#333,stroke-width:2px

Metered components: thread storage ($0.10 / GB / day), vector store retrieval, token usage.

MCP flow

flowchart LR
    A[User] --> B[MCP Client]
    B --> C[Your MCP Server]
    C --> D[Cloudflare Workers]
    D --> E[Any Model]
    E --> F[Response]

You control storage and retrieval; Cloudflare Workers provide 10 M free requests/month.

Key Architectural Differences

Client‑side memory – Conversation state is stored on the client, eliminating daily storage fees.

Multi‑model support – One MCP server can route requests to any model:

// Switch models per request
const response = await mcp.callTool("search_documents", {
  query: userQuery,
  model: "groq/llama-3.3-70b-versatile" // Free tier
});

Edge deployment on Cloudflare Workers – Deploy globally in minutes with no cold starts:

export default {
  async fetch(request, env) {
    const mcp = new MCPServer(env);
    return mcp.handle(request);
  }
};

Complete cost control – You decide chunk limits, caching, and model pricing before sending a request:

const searchConfig = {
  maxChunks: 3,
  chunkSize: 500,
  cacheStrategy: "lru",
  model: "groq-free"
};

const estimatedCost = calculateTokens(chunks) * modelPrice;
if (estimatedCost > threshold) {
  // fallback to cheaper model or reduce chunks
}

My MCP Implementation

// MCP Server on Cloudflare Workers
import { MCPServer } from "@modelcontextprotocol/sdk";

interface MCPTools {
  search_documents: (query: string, maxChunks?: number) => Promise;
  analyze_pdf: (fileId: string) => Promise;
  summarize_conversation: () => Promise;
}

// Cost breakdown for the same 4 questions:
const costs = {
  workersAI_embeddings: 0.011 / 1000, // $0.001 per 1 K tokens (example)
  vectorize_storage: 0,               // Included in free tier
  // ...additional cost items as needed
};

Using MCP, the same four‑question workflow costs a fraction of the $12.47 spent with the Assistants API, demonstrating how an open protocol can dramatically reduce AI‑driven application expenses.

I Spent $12 on 4 AI Questions. Then Linux Foundation Made MCP Official.

Why I Chose Assistants API (And Why You Probably Did Too)

The Hidden Cost Structure Nobody Warns You About

The Real Math for My “Simple” Query

Four questions broke down like this

Why Costs Spiral

Real‑World Cost Projections

Customer support bot (1 K conversations/day)

Document analysis app

My actual use case

The MCP Alternative: Same Features, 99 % Cost Reduction

What is MCP?

Architecture Comparison

Traditional Assistants API flow

MCP flow

Key Architectural Differences

My MCP Implementation

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Why I Chose Assistants API (And Why You Probably Did Too)

The Hidden Cost Structure Nobody Warns You About

The Real Math for My “Simple” Query

Four questions broke down like this

Why Costs Spiral

Real‑World Cost Projections

Customer support bot (1 K conversations/day)

Document analysis app

My actual use case

The MCP Alternative: Same Features, 99 % Cost Reduction

What is MCP?

Architecture Comparison

Traditional Assistants API flow

MCP flow

Key Architectural Differences

My MCP Implementation

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Customer support bot (1 K conversations/day)

The MCP Alternative: Same Features, 99 % Cost Reduction