STOP GUESSING: The Observability Stack I Built to Debug My Failing AI Agents

Published: (December 15, 2025 at 10:44 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for STOP GUESSING: The Observability Stack I Built to Debug My Failing AI Agents

The RAG pipeline is a black box. I got tired of guessing why my bot retrieved the wrong context, so I built an engine for reliable, observable vector retrieval and semantic content verification.

RAG and LLM verification are the new bottlenecks in AI development. I built MemVault (for reliable Hybrid Vector Retrieval) and ContextDiff (for deterministic AI Output Verification). The problem is observability; here are my solutions.

Tool 1: MemVault – The Observable Memory Server

I built MemVault to solve the complex retrieval‑integrity problem. Setting up dedicated vector databases is overkill for many projects, so I designed MemVault as a robust, open‑source Node.js wrapper around the reliable stack we already use: PostgreSQL + pgvector.

Hybrid Search 2.0: The End of Guesswork

Most RAG pipelines use only semantic search, which is brittle. MemVault ensures reliability with a weighted 3‑way hybrid score:

ComponentTechniqueWeight
Semantic (Vector)Cosine similarity via pgvector50 %
Exact Match (Keyword)BM25 (Postgres tsvector) for IDs, error codes, etc.30 %
Recency (Time)Decay function prioritising recent memories20 %

The Visualizer: Debugging in Real‑Time

MemVault offers a dashboard that visualises the vector search as it happens. You can instantly see why a specific document was retrieved and what its weighted score was.

Live demo: (link omitted in original)

Setup: Choose Your Economic Reality

  • Self‑Host (MIT License) – Run the entire stack (Postgres + Ollama for embeddings) 100 % offline via Docker. Ideal for privacy and zero API costs.
  • Managed API (RapidAPI) – Use the hosted service to skip maintenance and infrastructure setup (Free tier available).

Quick Start (NPM SDK)

npm install memvault-sdk-jakops88

Tool 2: ContextDiff – Semantic Output Validation

If MemVault ensures you retrieve the right context, ContextDiff makes sure the LLM doesn’t ruin it.

Deterministic Semantic Verification

ContextDiff is a production‑ready FastAPI/Next.js monorepo that performs LLM‑powered comparison, providing a structured assessment:

  • Risk Scoring – Objective 0‑100 risk score with a safety determination.
  • Change Detection – Flags specific change types with reasoning:
    • FACTUAL – Critical claims or certainty levels changed (e.g., “will” vs. “might”).
    • TONE – Sentiment or formality shifted.
    • OMISSION/ADDITION – Information was dropped or introduced.

Why Simple Diff Fails

Simple diff tools are useless for AI. ContextDiff detects that changing “Q1 2024” to “early 2024” is a semantic change in certainty (a risk), not just a string difference.

Use case: High‑stakes content validation (Legal, Medical, Finance) where maintaining the semantic integrity of the source is mandatory.

Demo: (link omitted in original)

Conclusion: Stop Debugging in the Dark

The future of reliable AI engineering hinges on observable, verifiable systems. If you’re tired of treating your RAG pipeline as a black box, explore these tools.

  • MemVault source code: (link omitted in original)
  • ContextDiff API & repository: (search for “ContextDiff”)
Back to Blog

Related posts

Read more »