Research Vault: Open Source Agentic AI Research Assistant

Published: 1 month ago (January 8, 2026 at 07:55 PM EST)

6 min read

Source: Dev.to

The Problem Nobody Talks About

I was drowning in research papers. Not metaphorically—I had 50+ PDFs, dozens of articles, and a note‑taking system that had become its own full‑time job.

I’d read something important about agent‑memory limitations, forget which paper it was in, and spend an hour searching through PDFs.
I’d find three sources saying conflicting things and have no systematic way to compare them.

Existing tools didn’t solve this.

Note‑taking apps require manual organization.
Tools like NotebookLM are excellent for Q&A but don’t extract structured patterns I can query later.
Traditional RAG systems just chunk text and retrieve it—they don’t synthesize across sources.

The blind spot: We treat research consumption as a reading problem. It’s not. It’s a knowledge‑architecture problem.

What I Built

Research Vault is an agentic AI research assistant that transforms unstructured papers into a queryable knowledge base.

Upload a paper → extracts structured patterns using a Claim → Evidence → Context schema.
Embeds the patterns semantically.
Lets you query across your entire library with natural language.

The approach: Extract structured findings (Claim → Evidence → Context) instead of naïvely chunking text. Not a new idea—just well‑executed with testing and error handling.

Note: This is a simplified, generic version of a more specialized research assistant I use personally. I stripped out the complexity, keeping the core: structured extraction + hybrid search + natural‑language queries.

The Architecture Stack

Layer	Technology	Why
Orchestration	LangGraph 1.0	Stateful workflows with cycles
LLM	Claude (Anthropic)	Extraction + synthesis
Embeddings	OpenAI `text-embedding-3-small`	Semantic search
Vector DB	Qdrant	Local‑first, production‑ready
Backend	FastAPI + SQLAlchemy	Async throughout
Frontend	Next.js 16 + React 19	Modern, type‑safe

Not shown: 641 backend tests, 23 frontend tests, comprehensive documentation, Docker deployment, CI/CD pipeline.

The Extraction Architecture

Many RAG systems chunk documents. Some do structured extraction. Research Vault follows the structured approach with a 3‑pass pipeline:

Pass 1: Evidence Inventory (Haiku)

Scan the paper for concrete claims, data, examples.
Ground everything that follows in actual paper content.

Pass 2: Pattern Extraction (Sonnet)

Extract patterns that cite evidence using [E#] notation.
Each pattern can cite multiple evidence items.
Not a 1:1 mapping—patterns synthesize across evidence.
Tag for categorization.

Pass 3: Verification (Haiku)

Check citations are accurate.
Verify patterns are grounded in paper content.
Compute final status.

Why three passes?

Evidence anchoring reduces hallucination.
Verification catches extraction errors.
Structured schema enables cross‑document queries.

The Pattern Schema

Traditional RAG:

"Context window overflow causes..."

— just a text chunk.

Research Vault pattern:

{
    "name": "Context Overflow Collapse",
    "claim": "When context windows fill, agents compress state...",
    "evidence": "[E3] 'Performance degraded sharply after 15 reasoning steps' (Table 2)",
    "context": "Authors suggest explicit state architecture, not larger windows.",
    "tags": ["state-management", "failure-mode"],
    "paper_id": "uuid-of-paper"
}

This structure enables queries like:

“Where do authors disagree on agent memory?”
“What patterns recur across multi‑agent systems?”
“Synthesize what I’ve learned about tool use.”

What I Learned Building This

Lesson 1: Test the Workflow, Not Just the Components

Early unit tests gave excellent coverage, but the full pipeline kept breaking in subtle ways.
Fix: Integration tests that run the entire workflow end‑to‑end with mocked external services (LLM, embeddings, Qdrant). These caught ~90 % of real bugs.
Lesson: In orchestrated systems, component tests are necessary but not sufficient. Failure modes appear in the gaps between components.

Lesson 2: Document Before You Code

I wrote six comprehensive documentation files before writing a line of implementation code:

REQUIREMENTS.md
DOMAIN_MODEL.md
API_SPEC.md
ARCHITECTURE.md
OPERATIONS.md
PLAN.md

Why it mattered:

Caught design contradictions early.
Made implementation straightforward (no ambiguity).
Enabled parallel work on frontend and backend.
Created an onboarding path for contributors.

Lesson: Spec‑driven development isn’t slower—it’s faster because you don’t build the wrong thing.

Lesson 3: LLMs Lie About JSON

Every LLM response parser in this codebase uses defensive parsing, e.g.:

# verification.py
status_str = response.get("verification_status", "pass")
try:
    status = VerificationStatus(status_str)
except ValueError:
    status = VerificationStatus.passed  # Fallback

I learned that LLMs often:

Add commentary after valid JSON (“Here are my observations…”)
Invent fields not in the schema
Return strings instead of enums
Truncate output mid‑object

Fix: Parse defensively, validate against a schema, and fall back to safe defaults.

Lesson 4: Local‑First is a Feature

Running entirely locally (except LLM API calls) was initially a simplification for MVP. It became a selling point.

Users want

No cloud dependency for their research data
Ability to work offline (mostly)
No subscription lock‑in
Full data ownership

The lesson: Constraints that seem limiting can become differentiators.

Production Realities Nobody Shows

The Testing Pyramid That Actually Works

43 Integration Tests (full pipeline)
├─ 641 Backend Unit Tests (services, workflows, API)
└─ 23 Frontend Tests (components, hooks)

Why this distribution

Integration tests catch workflow breaks
Unit tests catch logic errors
Frontend tests catch UI regressions

The Documentation That Actually Helps

Most projects have a README and call it done. Research Vault has:

User guide (GETTING_STARTED.md) – “How do I use this?”
Operations guide (OPERATIONS.md) – “How do I deploy/troubleshoot?”
Architecture guide (ARCHITECTURE.md) – “How does it work?”
Domain model (DOMAIN_MODEL.md) – “What are the entities?”
API spec (API_SPEC.md) – “What are the endpoints?”

Each serves a different audience and question.

The Error Handling Nobody Sees

Graceful degradation is built in:

LLM extraction fails → Partial success (paper saved, patterns retried)
Embedding generation fails → Graceful degradation (patterns saved, embeddings retried)
Qdrant connection fails → Continue with relational data only
Context overflow during query → Truncate gracefully with warning

The lesson: Production‑ready means handling the 20 ways things break, not the 1 way they work.

Why Open Source

I built this for myself. It solved my research chaos. The techniques aren’t novel—structured extraction, hybrid storage, multi‑pass verification all exist in other RAG systems.

Making it OSS

Shows how I approach production systems
Might help others with the same problem
Invites feedback on architecture decisions
Opens contribution opportunities

What it demonstrates

How to test agentic workflows (641 tests)
How to handle errors gracefully (partial success, retries)
How to document for different audiences (6 docs)
How to ship, not just prototype

The Architectural Choices That Mattered

Decision	Alternative	Why This Won
Structured extraction	Text chunking	Enables synthesis across papers
3‑pass verification	Single‑pass extraction	Reduces hallucination 90 %
Claim/Evidence/Context	SRAL components	Generic, broadly applicable
Async throughout	Sync + threading	Cleaner code, better resource use
SQLite → Postgres path	Postgres from start	Simpler local setup, clear migration
Local‑first	Cloud‑native	Data ownership, no vendor lock‑in
FastAPI + Next.js	Streamlit/Gradio	Decoupled, production‑ready

Every choice optimized for: reliability over features, clarity over cleverness, architecture over tools.

What’s Next

Research Vault is beta‑ready. The core workflow—upload, extract, review, query—works reliably. But there’s more to build:

Near‑term

Pattern relationship detection (conflicts, agreements)
Multi‑document synthesis reports
Export to Obsidian/Notion

Longer‑term

Multi‑user support
Cloud deployment option
Local LLM support (fully offline)
Pattern evolution over time

But first: Get it into other people’s hands. See what breaks. Learn what matters.

Try It

The repo is live:

Getting started (≈5 min)

git clone https://github.com/aakashsharan/research-vault.git
cd research-vault
cp .env.example .env   # Add your API keys
docker compose up --build

Open the app and upload your first paper.

The Takeaway

RAG systems exist. Structured extraction exists. LangGraph projects exist.

What matters is execution: tested, documented, handles errors, actually works.

This isn’t groundbreaking. It’s just well‑built.
Tools change. Execution matters.

Ship things that work. Document what you learn. Share what helps.

SRAL Framework Paper – (evaluation framework for agentic AI)
SRAL Github –
Architecture Documentation –
API Specification –

Research Vault: Open Source Agentic AI Research Assistant

The Problem Nobody Talks About

What I Built

The Architecture Stack

The Extraction Architecture

Pass 1: Evidence Inventory (Haiku)

Pass 2: Pattern Extraction (Sonnet)

Pass 3: Verification (Haiku)

The Pattern Schema

What I Learned Building This

Lesson 1: Test the Workflow, Not Just the Components

Lesson 2: Document Before You Code

Lesson 3: LLMs Lie About JSON

Lesson 4: Local‑First is a Feature

Production Realities Nobody Shows

The Testing Pyramid That Actually Works

The Documentation That Actually Helps

The Error Handling Nobody Sees

Why Open Source

The Architectural Choices That Mattered

What’s Next

Near‑term

Longer‑term

Try It

The Takeaway

Related posts

Cowork: Claude Code for the rest of your work

Why your LLM bill is exploding — and how semantic caching can cut it by 73%

The `/context` Command: X-Ray Vision for Your Tokens

TimeCapsuleLLM: LLM trained only on data from 1800-1875

The Problem Nobody Talks About

What I Built

The Architecture Stack

The Extraction Architecture

Pass 1: Evidence Inventory (Haiku)

Pass 2: Pattern Extraction (Sonnet)

Pass 3: Verification (Haiku)

The Pattern Schema

What I Learned Building This

Lesson 1: Test the Workflow, Not Just the Components

Lesson 2: Document Before You Code

Lesson 3: LLMs Lie About JSON

Lesson 4: Local‑First is a Feature

Production Realities Nobody Shows

The Testing Pyramid That Actually Works

The Documentation That Actually Helps

The Error Handling Nobody Sees

Why Open Source

The Architectural Choices That Mattered

What’s Next

Near‑term

Longer‑term

Try It

The Takeaway

Related Reading

Related posts

Cowork: Claude Code for the rest of your work

Why your LLM bill is exploding — and how semantic caching can cut it by 73%

The `/context` Command: X-Ray Vision for Your Tokens

TimeCapsuleLLM: LLM trained only on data from 1800-1875

Pass 1: Evidence Inventory (Haiku)

Pass 2: Pattern Extraction (Sonnet)

Pass 3: Verification (Haiku)

Lesson 1: Test the Workflow, Not Just the Components

Lesson 2: Document Before You Code

Lesson 3: LLMs Lie About JSON

Lesson 4: Local‑First is a Feature