Your prompts have a vendor lock-in problem and it's hiding in plain text
Source: Dev.to
Background
I’ve been writing prompts for Claude for a while now, using XML tags such as , , and nested structures. It works great. When I tried the same prompts with GPT‑4, they underperformed and fell apart, so I restructured everything to Markdown, added bold headers, and flattened the sections. That worked well on GPT‑4 but degraded performance on Claude. This isn’t just a personal issue—research shows it’s systematic.
Sclar et al. (ICLR 2024) measured that removing colons from a prompt template can drop LLaMA‑2‑13B accuracy from 82.6 % to 4.3 %. He et al. found that the best format for one model family overlaps less than 20 % with the best format for another. This pattern is often called prompt sensitivity, prompt brittleness, or model drifting, but it’s fundamentally a coupling problem similar to software module coupling.
The short version
Your prompt is coupled to the model the same way a function can be coupled to another module’s private internals. It works fine until you swap the dependency, then everything breaks and the cause is invisible—just formatting.
Industry Examples
- Aider ships with 313 model‑specific configurations in a 2,718‑line YAML file. Most models receive the system prompt “You NEVER leave comments describing code without implementing it!” while Claude‑3.7‑sonnet gets the opposite instruction, illustrating contradictory expectations across models.
- Claude Code only works with Claude by default. Developers have built proxy layers, Node.js monkey‑patches, and Ollama compatibility shims to make it work with other models. Some vendors now support Anthropic’s API schema to capture Claude Code’s user base.
- Cursor explicitly tells users to “switch to a different model and try again” when prompts underperform, acknowledging the coupling issue.
Tooling Landscape
I surveyed eleven tools that address various aspects of prompt engineering:
- DSPy – optimizes prompt content.
- Guidance – constrains model outputs.
- PromptLayer – versions prompts.
- Braintrust, Humanloop, Maxim AI, MLflow, Prompty, Promptomatix – provide tracking, evaluation, or workflow features.
None of these tools handle structural formatting differences between models, leaving a gap in the ecosystem.
My Solution: promptc
I built promptc, a transparent HTTP proxy that sits between your application and the model API. It rewrites prompt structure based on the target model’s preferences (e.g., XML ↔ Markdown, section reordering, delimiter swaps). An optional second pass can use a local Ollama model to convert phrases like “Let’s think step by step” into “ blocks.
Key benefits
- Zero code changes to your existing setup.
- Simple configuration: set
ANTHROPIC_BASE_URL=http://localhost:4000and run.
Formalization
I authored a paper that formalizes this coupling problem, borrowing the coupling taxonomy from Larry Constantine’s 1974 structured design work (content coupling, common coupling, data coupling, etc.). The paper includes:
- Full analysis of the problem.
- Survey of existing tools.
- Case studies on Aider, Claude Code, and Cursor.
Paper: sharma2026_prompt_coupling.pdf
Code:
Feedback is welcome—especially the “you’re wrong because…” kind.