Research Vault: Open Source Agentic AI Research Assistant

Published: (January 8, 2026 at 07:55 PM EST)
6 min read
Source: Dev.to

Source: Dev.to

The Problem Nobody Talks About

I was drowning in research papers. Not metaphorically—I had 50+ PDFs, dozens of articles, and a note‑taking system that had become its own full‑time job.

  • I’d read something important about agent‑memory limitations, forget which paper it was in, and spend an hour searching through PDFs.
  • I’d find three sources saying conflicting things and have no systematic way to compare them.

Existing tools didn’t solve this.

  • Note‑taking apps require manual organization.
  • Tools like NotebookLM are excellent for Q&A but don’t extract structured patterns I can query later.
  • Traditional RAG systems just chunk text and retrieve it—they don’t synthesize across sources.

The blind spot: We treat research consumption as a reading problem. It’s not. It’s a knowledge‑architecture problem.

What I Built

Research Vault is an agentic AI research assistant that transforms unstructured papers into a queryable knowledge base.

  • Upload a paper → extracts structured patterns using a Claim → Evidence → Context schema.
  • Embeds the patterns semantically.
  • Lets you query across your entire library with natural language.

The approach: Extract structured findings (Claim → Evidence → Context) instead of naïvely chunking text. Not a new idea—just well‑executed with testing and error handling.

Note: This is a simplified, generic version of a more specialized research assistant I use personally. I stripped out the complexity, keeping the core: structured extraction + hybrid search + natural‑language queries.

The Architecture Stack

LayerTechnologyWhy
OrchestrationLangGraph 1.0Stateful workflows with cycles
LLMClaude (Anthropic)Extraction + synthesis
EmbeddingsOpenAI text-embedding-3-smallSemantic search
Vector DBQdrantLocal‑first, production‑ready
BackendFastAPI + SQLAlchemyAsync throughout
FrontendNext.js 16 + React 19Modern, type‑safe

Not shown: 641 backend tests, 23 frontend tests, comprehensive documentation, Docker deployment, CI/CD pipeline.

The Extraction Architecture

Many RAG systems chunk documents. Some do structured extraction. Research Vault follows the structured approach with a 3‑pass pipeline:

Pass 1: Evidence Inventory (Haiku)

  • Scan the paper for concrete claims, data, examples.
  • Ground everything that follows in actual paper content.

Pass 2: Pattern Extraction (Sonnet)

  • Extract patterns that cite evidence using [E#] notation.
  • Each pattern can cite multiple evidence items.
  • Not a 1:1 mapping—patterns synthesize across evidence.
  • Tag for categorization.

Pass 3: Verification (Haiku)

  • Check citations are accurate.
  • Verify patterns are grounded in paper content.
  • Compute final status.

Why three passes?

  • Evidence anchoring reduces hallucination.
  • Verification catches extraction errors.
  • Structured schema enables cross‑document queries.

The Pattern Schema

Traditional RAG:

"Context window overflow causes..."

— just a text chunk.

Research Vault pattern:

{
    "name": "Context Overflow Collapse",
    "claim": "When context windows fill, agents compress state...",
    "evidence": "[E3] 'Performance degraded sharply after 15 reasoning steps' (Table 2)",
    "context": "Authors suggest explicit state architecture, not larger windows.",
    "tags": ["state-management", "failure-mode"],
    "paper_id": "uuid-of-paper"
}

This structure enables queries like:

  • “Where do authors disagree on agent memory?”
  • “What patterns recur across multi‑agent systems?”
  • “Synthesize what I’ve learned about tool use.”

What I Learned Building This

Lesson 1: Test the Workflow, Not Just the Components

  • Early unit tests gave excellent coverage, but the full pipeline kept breaking in subtle ways.
  • Fix: Integration tests that run the entire workflow end‑to‑end with mocked external services (LLM, embeddings, Qdrant). These caught ~90 % of real bugs.
  • Lesson: In orchestrated systems, component tests are necessary but not sufficient. Failure modes appear in the gaps between components.

Lesson 2: Document Before You Code

I wrote six comprehensive documentation files before writing a line of implementation code:

  • REQUIREMENTS.md
  • DOMAIN_MODEL.md
  • API_SPEC.md
  • ARCHITECTURE.md
  • OPERATIONS.md
  • PLAN.md

Why it mattered:

  • Caught design contradictions early.
  • Made implementation straightforward (no ambiguity).
  • Enabled parallel work on frontend and backend.
  • Created an onboarding path for contributors.

Lesson: Spec‑driven development isn’t slower—it’s faster because you don’t build the wrong thing.

Lesson 3: LLMs Lie About JSON

Every LLM response parser in this codebase uses defensive parsing, e.g.:

# verification.py
status_str = response.get("verification_status", "pass")
try:
    status = VerificationStatus(status_str)
except ValueError:
    status = VerificationStatus.passed  # Fallback

I learned that LLMs often:

  • Add commentary after valid JSON (“Here are my observations…”)
  • Invent fields not in the schema
  • Return strings instead of enums
  • Truncate output mid‑object

Fix: Parse defensively, validate against a schema, and fall back to safe defaults.

Lesson 4: Local‑First is a Feature

Running entirely locally (except LLM API calls) was initially a simplification for MVP. It became a selling point.

Users want

  • No cloud dependency for their research data
  • Ability to work offline (mostly)
  • No subscription lock‑in
  • Full data ownership

The lesson: Constraints that seem limiting can become differentiators.

Production Realities Nobody Shows

The Testing Pyramid That Actually Works

43 Integration Tests (full pipeline)
├─ 641 Backend Unit Tests (services, workflows, API)
└─ 23 Frontend Tests (components, hooks)

Why this distribution

  • Integration tests catch workflow breaks
  • Unit tests catch logic errors
  • Frontend tests catch UI regressions

The Documentation That Actually Helps

Most projects have a README and call it done. Research Vault has:

  • User guide (GETTING_STARTED.md) – “How do I use this?”
  • Operations guide (OPERATIONS.md) – “How do I deploy/troubleshoot?”
  • Architecture guide (ARCHITECTURE.md) – “How does it work?”
  • Domain model (DOMAIN_MODEL.md) – “What are the entities?”
  • API spec (API_SPEC.md) – “What are the endpoints?”

Each serves a different audience and question.

The Error Handling Nobody Sees

Graceful degradation is built in:

  • LLM extraction fails → Partial success (paper saved, patterns retried)
  • Embedding generation fails → Graceful degradation (patterns saved, embeddings retried)
  • Qdrant connection fails → Continue with relational data only
  • Context overflow during query → Truncate gracefully with warning

The lesson: Production‑ready means handling the 20 ways things break, not the 1 way they work.

Why Open Source

I built this for myself. It solved my research chaos. The techniques aren’t novel—structured extraction, hybrid storage, multi‑pass verification all exist in other RAG systems.

Making it OSS

  • Shows how I approach production systems
  • Might help others with the same problem
  • Invites feedback on architecture decisions
  • Opens contribution opportunities

What it demonstrates

  • How to test agentic workflows (641 tests)
  • How to handle errors gracefully (partial success, retries)
  • How to document for different audiences (6 docs)
  • How to ship, not just prototype

The Architectural Choices That Mattered

DecisionAlternativeWhy This Won
Structured extractionText chunkingEnables synthesis across papers
3‑pass verificationSingle‑pass extractionReduces hallucination 90 %
Claim/Evidence/ContextSRAL componentsGeneric, broadly applicable
Async throughoutSync + threadingCleaner code, better resource use
SQLite → Postgres pathPostgres from startSimpler local setup, clear migration
Local‑firstCloud‑nativeData ownership, no vendor lock‑in
FastAPI + Next.jsStreamlit/GradioDecoupled, production‑ready

Every choice optimized for: reliability over features, clarity over cleverness, architecture over tools.

What’s Next

Research Vault is beta‑ready. The core workflow—upload, extract, review, query—works reliably. But there’s more to build:

Near‑term

  • Pattern relationship detection (conflicts, agreements)
  • Multi‑document synthesis reports
  • Export to Obsidian/Notion

Longer‑term

  • Multi‑user support
  • Cloud deployment option
  • Local LLM support (fully offline)
  • Pattern evolution over time

But first: Get it into other people’s hands. See what breaks. Learn what matters.

Try It

The repo is live:

Getting started (≈5 min)

git clone https://github.com/aakashsharan/research-vault.git
cd research-vault
cp .env.example .env   # Add your API keys
docker compose up --build

Open the app and upload your first paper.

The Takeaway

RAG systems exist. Structured extraction exists. LangGraph projects exist.

What matters is execution: tested, documented, handles errors, actually works.

  • This isn’t groundbreaking. It’s just well‑built.
  • Tools change. Execution matters.

Ship things that work. Document what you learn. Share what helps.

  • SRAL Framework Paper – (evaluation framework for agentic AI)
  • SRAL Github
  • Architecture Documentation
  • API Specification
Back to Blog

Related posts

Read more »

Real-World Agent Examples with Gemini 3

markdown December 19, 2025 We are entering a new phase of agentic AI. Developers are moving beyond simple notebooks to build complex, production‑ready agentic w...

Mixtral of Experts

Overview Mixtral 8x7B is a language model that distributes tasks across many tiny specialists, achieving both speed and intelligence. It employs a Sparse Mixtu...

AI-Radar.it

!Cover image for AI-Radar.ithttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazona...