Why we replaced LangChain with the raw Anthropic SDK in production

Published: 2 days ago (April 21, 2026 at 07:40 AM EDT)

3 min read

Source: Dev.to

Cover image for Why we replaced LangChain with the raw Anthropic SDK in production

The symptoms

LangChain’s abstractions started leaking the moment we went beyond happy‑path demos. Three things kept biting us:

Stack traces from hell. A single AgentExecutor.invoke() call crossed 14 frames of LangChain internals before reaching our code. Debugging a malformed tool call felt like archaeology.

Version churn. Every minor bump renamed, relocated, or deprecated something we depended on. Our CI was pinned to a specific LangChain SHA for six months just to stay green.

Abstracted‑away observability. We couldn’t cleanly trace token usage, cache hits, or per‑tool latencies without monkey‑patching internal classes.

Meanwhile, Anthropic’s native SDK was getting better. Native tool calling, prompt caching, extended thinking, streaming — all first‑class and documented.

The refactor

The logic we were using LangChain for wasn’t complicated:

Build a system prompt from templates
Call Claude with a list of tools
Route tool calls to our internal handlers
Return the result

We replaced ~800 lines of LangChain glue with this:

from anthropic import Anthropic

client = Anthropic()

def run_agent(user_input: str, tools: list[dict], tool_handlers: dict):
    messages = [{"role": "user", "content": user_input}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=4096,
            tools=tools,
            messages=messages,
            system=SYSTEM_PROMPT,
        )

        if response.stop_reason == "end_turn":
            return response.content[0].text

        # Handle tool use
        tool_calls = [b for b in response.content if b.type == "tool_use"]
        messages.append({"role": "assistant", "content": response.content})

        tool_results = []
        for call in tool_calls:
            result = tool_handlers[call.name](**call.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": call.id,
                "content": str(result),
            })
        messages.append({"role": "user", "content": tool_results})

That’s it. No AgentExecutor, no Callback, no ConversationBufferMemory. Just the model and our code.

The metrics

We ran the old and new paths side‑by‑side for two weeks on Vettio’s interview‑bot service. Results:

p50 latency: 2.1 s → 1.4 s (‑33%)
p95 latency: 4.8 s → 3.2 s (‑33%)
Error rate: 0.9 % → 0.2 %
Stack trace depth on errors: 14 → 4 frames
Lines of integration code: 812 → 187

The latency win came mostly from eliminating LangChain’s implicit retry‑and‑retry‑again behavior on tool‑use mismatches. With direct SDK calls, a malformed tool schema fails loudly instead of silently retrying three times.

When LangChain still makes sense

This isn’t a blanket “don’t use LangChain” post. It still wins if you need:

Multi‑provider abstraction. Swapping between Claude, GPT‑4, and Gemini behind a stable interface.
LangGraph workflows for graph‑based agent topologies you’d otherwise build from scratch.
LangSmith observability you don’t want to rebuild.

For a team that’s already committed to one provider (we’re all‑in on Claude) and wants full control over prompts, tool schemas, and observability — the native SDK is the right tool in 2026.

The lesson

Abstractions pay for themselves when the underlying APIs are bad. Anthropic’s API isn’t bad. It’s clean, well‑documented, and stable. The abstraction tax was real; the abstraction benefit had quietly evaporated.

If you’re still on LangChain in a production Claude app, benchmark a direct‑SDK rewrite of your hot path. You might be surprised.

Why we replaced LangChain with the raw Anthropic SDK in production

The symptoms

The refactor

The metrics

When LangChain still makes sense

The lesson

Related posts

Mastering Your Heartbeat: Architecting a High-Frequency Health Monitoring System with InfluxDB and Grafana

Деконструкция стриминга в X (Twitter): Построение высокопроизводительного движка экстракции видео с HLS и FFmpeg

What Being a Field Tech Taught Me About Real-World Networking

Linode vs Vultr Performance: Real VPS Benchmarks