Visual Debugging for AI Agents (ANY Framework)
Source: Dev.to
TL;DR
We built LangGraph Studio’s visual debugging experience, but made it work with every AI agent framework. Open source. Local‑first. Try it now.
Traditional debugging tools don’t work for AI agents
- Breakpoints → Agents are async, non‑deterministic
- Print statements → Good luck finding the relevant logs
- Stack traces → Doesn’t show LLM calls or agent decisions
- Unit tests → Hard to test non‑deterministic behavior
What developers told us (from talking to 50+ production teams)
“LangGraph is S‑tier specifically because of visual debugging. But we’re stuck—we can’t switch frameworks without losing the debugger.”
The data
- 94 % of production deployments need observability
- LangGraph rated S‑tier for visual execution traces
- All existing solutions are framework‑locked
The landscape
| Solution | Framework support |
|---|---|
| LangGraph Studio | LangGraph only |
| LangSmith | LangChain‑focused |
| Crew Analytics | CrewAI only |
| AutoGen | No visual debugger |
Developers are choosing frameworks based on tooling, not capabilities. That’s backwards.
Introducing OpenClaw Observability Toolkit
Universal visual debugging for AI agents.
Integrations
LangChain
from openclaw_observability.integrations import LangChainCallbackHandler
chain.run(input="query", callbacks=[LangChainCallbackHandler()])
Raw Python (works today)
from openclaw_observability import observe
@observe()
def my_agent_function(input):
return process(input)
CrewAI, AutoGen (coming soon)
One tool. All frameworks.
Interactive execution graph
┌─────────────────────────────────────┐
│ Customer Service Agent │
├─────────────────────────────────────┤
│ [User Query: "Why was I charged?"] │
│ ↓ │
│ ┌─────────────┐ │
│ │ Classify │ 🟢 250ms │ ← Click to inspect
│ │ Intent │ │
│ └─────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ Check │ 🔴 FAILED │ ← See error details
│ │ Database │ │
│ └─────────────┘ │
└─────────────────────────────────────┘
Click any node to see:
- Inputs & outputs – what went in, what came out
- LLM calls – full prompts, responses, tokens, cost
- Timing – duration of each step
- Errors – full stack traces with context
Track what matters
- Cost per agent
- Latency per step
- Success rates
- Quality metrics
Example: debugging a failing customer‑service query
Without observability
ERROR: Query failed
(Good luck figuring out which agent, which step, and why)
With OpenClaw Observability
Trace: customer_query_abc123
├─ Router Agent → Success (200ms)
│ └─ Intent: "billing_issue"
├─ Billing Agent → FAILED (350ms)
│ └─ Database lookup timeout
└─ Support Agent → Not reached
Click “Billing Agent” → see full error:
DatabaseTimeout: Connection timeout after 30s
at check_subscription_status()
Input: {"user_id": "12345"}
Database: prod-billing-db (response time: 45s)
Root cause: Billing database is slow. Scale it up.
Time to debug: 30 seconds (instead of 3 hours).
Installation
pip install openclaw-observability
from openclaw_observability import observe, init_tracer
from openclaw_observability.span import SpanType
tracer = init_tracer(agent_id="my-agent")
@observe(span_type=SpanType.AGENT_DECISION)
def choose_action(state):
action = llm.predict(state)
return action
@observe(span_type=SpanType.TOOL_CALL)
def fetch_data(query):
return database.query(query)
result = choose_action(current_state)
Run the UI:
python -m openclaw_observability.server
# Open http://localhost:5000
Performance & deployment
Contribute
- Framework integrations (CrewAI, AutoGen, custom frameworks)
- UI improvements (filtering, search, real‑time updates)
- Production features (monitoring, alerts, metrics)
GitHub:
Documentation: Quick Start Guide
Examples: examples/ directory
Discord: Join our community
Built with ❤️ by AI agents at Reflectt.