Build vs. Buy for Agent Harnesses: The Real Question

Published: 1 month ago (March 17, 2026 at 01:56 AM EDT)

4 min read

Source: Dev.to

Source: Dev.to

Background

A thread on the CTO Lunches mailing list sparked a classic “build vs. buy” debate. One member asked:

“Like many of you, I see trends converging and am struggling to navigate the build vs buy decisions and identify where or when it makes sense to build a bespoke approach (with the attendant maintenance and support) and when to bet on third‑party / managed ‘service’-ish approaches.”

The organizer replied:

“Your question is the question everyone is asking. And I’m not seeing folks deciding yet.”

Another participant concluded after sharing his approach:

“I think you can get a good chunk of the benefit building incrementally and owning your context and it substantially reduces the ‘pick the right one’ problem which to me feels unsolvable right now.”

Fourteen messages later, no consensus emerged.

The Real‑World Situation

We were building LLM‑powered dialogue systems for customer‑facing, multi‑turn, high‑stakes interactions—where a bad response can have real consequences.

Standard Approach

Write a system prompt.
Append the conversation history.
Send the combined text to the model.

This works in demos, but in production long conversations cause the model to drift:

Instructions given early (e.g., turn 3) are ignored by later turns (e.g., turn 15).
Rules compete with history for the token budget.
The same question can receive different answers depending on conversation length.

Existing Tools We Evaluated

Tool	What It Provides	What It Lacks
RAG	Stateless retrieval	No awareness of conversation position
Agent frameworks (LangChain, CrewAI)	Tool orchestration	No dialogue state management
Visual bot builders	Intent matching	No context control

None of these addressed the core failure: at each turn, the model sees too much irrelevant context and not enough of the relevant context.

Our Attempts to Patch Existing Solutions

Added layers and workarounds.
Each workaround introduced new drift.
The system became increasingly fragile.

Eventually we stopped patching and started building a dedicated solution.

ExoChat: Building the Right Solution

ExoChat models conversations as finite state machines:

State‑specific prompts – each state has its own prompt.
Structured facts – context is assembled from structured data rather than raw history.
Explicit exit conditions – the model only sees what’s relevant for the current step.

Core Principle

Minimum viable context per state – provide only what the model needs right now, not everything it might need later.

Lessons Learned

Diagnose Before Deciding

Ask the right question: What exactly is broken?
- Which turn drifts?
- Which instruction gets ignored?
- Under what conditions?
Most teams jump to custom solutions without answering these questions, leading to vague problems that are hard to solve sustainably.

Narrow the Problem

We initially tried to solve the vague problem “LLM dialogues are unreliable.” It failed. When we narrowed the focus to context accumulation causing instruction drift in long conversations, the solution became clear, and the build‑vs‑buy decision resolved itself—nothing on the market addressed that precise issue.

Experience Trumps Analysis

Understanding the specific pain point comes from repeatedly hitting the same wall, not from abstract analysis.

Key Takeaways

Build or buy is the wrong starting point.
Diagnose the precise failure in your workflow first.
If existing tools don’t address that exact failure, a targeted build may be justified.
Keep the solution minimal—provide only the context the model needs at each step.

By focusing on the concrete defect rather than the generic dilemma, teams can make informed decisions and avoid unnecessary custom complexity.