Build vs. Buy for Agent Harnesses: The Real Question
Source: Dev.to
Background
A thread on the CTO Lunches mailing list sparked a classic “build vs. buy” debate. One member asked:
“Like many of you, I see trends converging and am struggling to navigate the build vs buy decisions and identify where or when it makes sense to build a bespoke approach (with the attendant maintenance and support) and when to bet on third‑party / managed ‘service’-ish approaches.”
The organizer replied:
“Your question is the question everyone is asking. And I’m not seeing folks deciding yet.”
Another participant concluded after sharing his approach:
“I think you can get a good chunk of the benefit building incrementally and owning your context and it substantially reduces the ‘pick the right one’ problem which to me feels unsolvable right now.”
Fourteen messages later, no consensus emerged.
The Real‑World Situation
We were building LLM‑powered dialogue systems for customer‑facing, multi‑turn, high‑stakes interactions—where a bad response can have real consequences.
Standard Approach
- Write a system prompt.
- Append the conversation history.
- Send the combined text to the model.
This works in demos, but in production long conversations cause the model to drift:
- Instructions given early (e.g., turn 3) are ignored by later turns (e.g., turn 15).
- Rules compete with history for the token budget.
- The same question can receive different answers depending on conversation length.
Existing Tools We Evaluated
| Tool | What It Provides | What It Lacks |
|---|---|---|
| RAG | Stateless retrieval | No awareness of conversation position |
| Agent frameworks (LangChain, CrewAI) | Tool orchestration | No dialogue state management |
| Visual bot builders | Intent matching | No context control |
None of these addressed the core failure: at each turn, the model sees too much irrelevant context and not enough of the relevant context.
Our Attempts to Patch Existing Solutions
- Added layers and workarounds.
- Each workaround introduced new drift.
- The system became increasingly fragile.
Eventually we stopped patching and started building a dedicated solution.
ExoChat: Building the Right Solution
ExoChat models conversations as finite state machines:
- State‑specific prompts – each state has its own prompt.
- Structured facts – context is assembled from structured data rather than raw history.
- Explicit exit conditions – the model only sees what’s relevant for the current step.
Core Principle
Minimum viable context per state – provide only what the model needs right now, not everything it might need later.
Lessons Learned
Diagnose Before Deciding
Ask the right question: What exactly is broken?
- Which turn drifts?
- Which instruction gets ignored?
- Under what conditions?
Most teams jump to custom solutions without answering these questions, leading to vague problems that are hard to solve sustainably.
Narrow the Problem
We initially tried to solve the vague problem “LLM dialogues are unreliable.” It failed. When we narrowed the focus to context accumulation causing instruction drift in long conversations, the solution became clear, and the build‑vs‑buy decision resolved itself—nothing on the market addressed that precise issue.
Experience Trumps Analysis
Understanding the specific pain point comes from repeatedly hitting the same wall, not from abstract analysis.
Key Takeaways
- Build or buy is the wrong starting point.
- Diagnose the precise failure in your workflow first.
- If existing tools don’t address that exact failure, a targeted build may be justified.
- Keep the solution minimal—provide only the context the model needs at each step.
By focusing on the concrete defect rather than the generic dilemma, teams can make informed decisions and avoid unnecessary custom complexity.