Choosing the Right LLM for the Umbraco CMS Developer MCP: An Quick Cost and Performance Analysis

Published: 0 month ago (January 11, 2026 at 08:35 AM EST)

6 min read

Source: Dev.to

Source: Dev.to

Why Efficiency Matters

When we moved beyond proof‑of‑concept late last year, the following questions surfaced:

With subscription‑based services like Claude Pro or ChatGPT Plus, inefficiencies are often hidden. You pay a flat fee and never see the true cost of each operation.
It’s easy to ignore efficiency when the bill doesn’t change—until you hit usage limits or get rate‑limited mid‑workflow.

Three key factors

Speed = friction – A workflow that takes 40 s instead of 20 s isn’t just slower; developers lose focus, context‑switch, or abandon the tool entirely.
More tokens = more computation –
- Higher latency – each token adds processing time.
- Faster limit consumption – subscriptions and APIs both have token caps.
- Compounding inefficiency – wasteful prompts multiply across every operation.
Hidden costs surface –
- Scale up → subscription limits are hit, rate limiting kicks in.
- Multiple seats → what works for one developer becomes expensive across a team.
- Switch to API pricing → pay‑per‑token models expose every inefficiency immediately.

The difference between $3 and $13 per 1 000 operations is the difference between a sustainable tool and an expensive experiment.

Efficient prompts and capable models that reason in fewer tokens compound savings across every operation.

Computational Sustainability

More efficient models that finish tasks with fewer tokens and less time have a smaller environmental footprint. When you’re running thousands of AI operations, a model that’s 30 % faster isn’t just about saving seconds—it’s about responsible resource usage.

Gaining Visibility with the Claude Agent SDK

We recently integrated the Claude Agent SDK into our evaluation test suite (similar to acceptance tests for websites). This gave us visibility into what actually happens during AI‑powered workflows.

For each test run we now track:

Metric	What it tells us
Execution time	How long the workflow takes
Conversation turns	Number of back‑and‑forth exchanges with the LLM
Token usage	Input + output tokens consumed
Cost	Actual USD spent per operation

This data transformed our understanding of how different models perform with Umbraco MCP.

Prompt Engineering: Deliberate Optimisation

We’re not just throwing prompts at models and hoping for the best. Our evaluation prompts are deliberately optimised for smaller, faster models.

What that looks like

Explicit task lists – numbered steps rather than open‑ended instructions.
Clear variable tracking – “Save the folder ID for later use” instead of assuming the model will infer it.
Specific tool guidance – “Use the image ID from step 3, NOT the folder ID” to prevent confusion.
Defined success criteria – exact strings to output on completion.

We reduce the cognitive load on the model by giving structured, unambiguous instructions that even smaller models can follow reliably.

Trade‑off: more verbose prompts → consistent results across model tiers.

And it works—Umbraco MCP performs well even with smaller, faster models when the prompts are clear.

Test Scenarios

Our test suite is still limited—we’re in early stages. Consider this an interesting experiment rather than rigorous benchmarking. That said, we designed two representative scenarios:

Basic 3‑step operation – create a data‑type folder, verify it exists, delete it.
10‑step media lifecycle – create folder, upload image, update metadata, check references, move to recycle bin, restore, permanently delete image, delete folder.

The complex workflow test looks like this:

const TEST_PROMPT = `Complete these tasks in order:
1. Get the media root to see the current structure
2. Create a media folder called "_Test Media Folder" at the root
   - IMPORTANT: Save the folder ID returned from this call for later use
3. Create a test image media item INSIDE the new folder with name "_Test Image"
   - Use the folder ID from step 2 as the parentId
   - IMPORTANT: Save the image ID returned from this call
4. Update the IMAGE to change its name to "_Test Image Updated"
   - Use the image ID from step 3, NOT the folder ID
5. Check if the IMAGE is referenced anywhere
6. Move the IMAGE to the recycle bin
   - Use the image ID from step 3, NOT the folder ID
7. Restore the IMAGE from the recycle bin
8. Delete the IMAGE permanently
9. Delete the FOLDER
10. When complete, say 'The media lifecycle workflow has completed successfully'`;

Notice how explicit the prompt is—we tell the model exactly what to do, which IDs to track, and what to avoid confusing. This is what allows smaller models to succeed.

Results Across Claude Models

We ran each workflow multiple times across five Claude models:

Model	Avg. Time	Avg. Turns	Avg. Cost
Claude 3.5 Haiku (baseline)	12.4 s	4.0	$0.017
Claude Haiku 4.5	8.6 s	3.7	$0.019
Claude Sonnet 4	13.9 s	4.0	$0.025
Claude Sonnet 4.5	11.8 s	3.0	$0.021
Claude Opus 4.5	26.4 s	8.0	$0.123

Key finding: Haiku 4.5 completed simple tasks ~40 % faster than Haiku 3.5 while keeping costs comparable.

Takeaway

By optimising prompts, tracking metrics, and choosing the right model, we can make AI‑driven Umbraco MCP both fast and cost‑effective, while also moving toward a more sustainable computational footprint.

Model Performance Summary

Model	Time	Turns	Cost
Haiku 3.5	31.1 s	11	$0.029
Haiku 4.5	21.5 s	11	$0.036
Sonnet 4	37.9 s	11	$0.081
Sonnet 4.5	40.4 s	11	$0.084
Opus 4.5	42.5 s	11	$0.134

Key finding: All models completed the complex workflow in exactly 11 turns – the task’s complexity normalised the turn count. Execution time and cost, however, varied dramatically.

Important Caveats

Results are based on a small number of test runs – not statistically significant.
Prompts were heavily optimised for smaller models; less explicit prompts may favour larger models.
This is an exploratory analysis, not a definitive recommendation.

Recommendations for Umbraco MCP Workloads

For our specific Umbraco MCP workloads with well‑structured prompts, Claude Haiku 4.5 (claude‑haiku‑4‑5‑20251001) delivered:

31 % faster execution than Haiku 3.5 on complex workflows.
44‑49 % faster than Sonnet and Opus models.
The best cost‑performance ratio across all tests.

Why Larger Models Didn’t Shine

Same turn count: Complex workflows required 11 turns regardless of model.
Higher latency per turn: Larger models introduced more delay per interaction.
2‑4× higher cost: No corresponding benefit in speed or quality.

For structured MCP tool‑calling tasks with explicit prompts, the additional reasoning capability of larger models didn’t translate into better performance. The task was well‑defined, the tools were documented, and Haiku handled it efficiently.

Cost per 100 Operations

Model	Approx. Cost
Haiku 3.5	~$2.90
Haiku 4.5	~$3.60
Sonnet 4 / 4.5	~$8.00
Opus 4.5	~$13.40

Example: 1,000 AI‑Assisted Operations / month

Haiku 4.5: ≈ $36 / month
Opus 4.5: ≈ $134 / month

That’s nearly 4× the cost for slower performance.

Updated Default Model

Based on this analysis, Umbraco MCP’s default evaluation model is now Claude Haiku 4.5.

Practical Guidance for Building MCP‑Based Workflows

Start with Haiku 4.5 – fast, capable, and cost‑effective.
Invest in prompt engineering – explicit, well‑structured prompts reduce reliance on larger models. Let the prompt do part of the reasoning.
Measure before upgrading – don’t assume bigger models are better for your use case.
Track your metrics – use the Agent SDK (or similar) to monitor actual cost and performance.

Next Steps in Our Optimisation Journey

Add more complex multi‑entity workflows to the evaluation suite.
Test edge cases and error‑recovery scenarios.
Continue refining prompts for maximum efficiency with smaller models.

Core Takeaway

Umbraco MCP works well even with smaller, faster models when prompts are explicit. You don’t need the most expensive LLM to manage your CMS effectively—clear prompts combined with well‑designed tools are the key.

Analysis date: January 2025
Tooling: Claude Agent SDK against a local Umbraco 17 instance.
Note: Results may vary with network latency, Umbraco configuration, and workflow complexity.