Research Vault: Open Source Agentic AI Research Assistant
Source: Dev.to
The Problem Nobody Talks About
I was drowning in research papers. Not metaphorically—I had 50+ PDFs, dozens of articles, and a note‑taking system that had become its own full‑time job.
- I’d read something important about agent‑memory limitations, forget which paper it was in, and spend an hour searching through PDFs.
- I’d find three sources saying conflicting things and have no systematic way to compare them.
Existing tools didn’t solve this.
- Note‑taking apps require manual organization.
- Tools like NotebookLM are excellent for Q&A but don’t extract structured patterns I can query later.
- Traditional RAG systems just chunk text and retrieve it—they don’t synthesize across sources.
The blind spot: We treat research consumption as a reading problem. It’s not. It’s a knowledge‑architecture problem.
What I Built
Research Vault is an agentic AI research assistant that transforms unstructured papers into a queryable knowledge base.
- Upload a paper → extracts structured patterns using a Claim → Evidence → Context schema.
- Embeds the patterns semantically.
- Lets you query across your entire library with natural language.
The approach: Extract structured findings (Claim → Evidence → Context) instead of naïvely chunking text. Not a new idea—just well‑executed with testing and error handling.
Note: This is a simplified, generic version of a more specialized research assistant I use personally. I stripped out the complexity, keeping the core: structured extraction + hybrid search + natural‑language queries.
The Architecture Stack
| Layer | Technology | Why |
|---|---|---|
| Orchestration | LangGraph 1.0 | Stateful workflows with cycles |
| LLM | Claude (Anthropic) | Extraction + synthesis |
| Embeddings | OpenAI text-embedding-3-small | Semantic search |
| Vector DB | Qdrant | Local‑first, production‑ready |
| Backend | FastAPI + SQLAlchemy | Async throughout |
| Frontend | Next.js 16 + React 19 | Modern, type‑safe |
Not shown: 641 backend tests, 23 frontend tests, comprehensive documentation, Docker deployment, CI/CD pipeline.
The Extraction Architecture
Many RAG systems chunk documents. Some do structured extraction. Research Vault follows the structured approach with a 3‑pass pipeline:
Pass 1: Evidence Inventory (Haiku)
- Scan the paper for concrete claims, data, examples.
- Ground everything that follows in actual paper content.
Pass 2: Pattern Extraction (Sonnet)
- Extract patterns that cite evidence using
[E#]notation. - Each pattern can cite multiple evidence items.
- Not a 1:1 mapping—patterns synthesize across evidence.
- Tag for categorization.
Pass 3: Verification (Haiku)
- Check citations are accurate.
- Verify patterns are grounded in paper content.
- Compute final status.
Why three passes?
- Evidence anchoring reduces hallucination.
- Verification catches extraction errors.
- Structured schema enables cross‑document queries.
The Pattern Schema
Traditional RAG:
"Context window overflow causes..."
— just a text chunk.
Research Vault pattern:
{
"name": "Context Overflow Collapse",
"claim": "When context windows fill, agents compress state...",
"evidence": "[E3] 'Performance degraded sharply after 15 reasoning steps' (Table 2)",
"context": "Authors suggest explicit state architecture, not larger windows.",
"tags": ["state-management", "failure-mode"],
"paper_id": "uuid-of-paper"
}
This structure enables queries like:
- “Where do authors disagree on agent memory?”
- “What patterns recur across multi‑agent systems?”
- “Synthesize what I’ve learned about tool use.”
What I Learned Building This
Lesson 1: Test the Workflow, Not Just the Components
- Early unit tests gave excellent coverage, but the full pipeline kept breaking in subtle ways.
- Fix: Integration tests that run the entire workflow end‑to‑end with mocked external services (LLM, embeddings, Qdrant). These caught ~90 % of real bugs.
- Lesson: In orchestrated systems, component tests are necessary but not sufficient. Failure modes appear in the gaps between components.
Lesson 2: Document Before You Code
I wrote six comprehensive documentation files before writing a line of implementation code:
REQUIREMENTS.mdDOMAIN_MODEL.mdAPI_SPEC.mdARCHITECTURE.mdOPERATIONS.mdPLAN.md
Why it mattered:
- Caught design contradictions early.
- Made implementation straightforward (no ambiguity).
- Enabled parallel work on frontend and backend.
- Created an onboarding path for contributors.
Lesson: Spec‑driven development isn’t slower—it’s faster because you don’t build the wrong thing.
Lesson 3: LLMs Lie About JSON
Every LLM response parser in this codebase uses defensive parsing, e.g.:
# verification.py
status_str = response.get("verification_status", "pass")
try:
status = VerificationStatus(status_str)
except ValueError:
status = VerificationStatus.passed # Fallback
I learned that LLMs often:
- Add commentary after valid JSON (“Here are my observations…”)
- Invent fields not in the schema
- Return strings instead of enums
- Truncate output mid‑object
Fix: Parse defensively, validate against a schema, and fall back to safe defaults.
Lesson 4: Local‑First is a Feature
Running entirely locally (except LLM API calls) was initially a simplification for MVP. It became a selling point.
Users want
- No cloud dependency for their research data
- Ability to work offline (mostly)
- No subscription lock‑in
- Full data ownership
The lesson: Constraints that seem limiting can become differentiators.
Production Realities Nobody Shows
The Testing Pyramid That Actually Works
43 Integration Tests (full pipeline)
├─ 641 Backend Unit Tests (services, workflows, API)
└─ 23 Frontend Tests (components, hooks)
Why this distribution
- Integration tests catch workflow breaks
- Unit tests catch logic errors
- Frontend tests catch UI regressions
The Documentation That Actually Helps
Most projects have a README and call it done. Research Vault has:
- User guide (
GETTING_STARTED.md) – “How do I use this?” - Operations guide (
OPERATIONS.md) – “How do I deploy/troubleshoot?” - Architecture guide (
ARCHITECTURE.md) – “How does it work?” - Domain model (
DOMAIN_MODEL.md) – “What are the entities?” - API spec (
API_SPEC.md) – “What are the endpoints?”
Each serves a different audience and question.
The Error Handling Nobody Sees
Graceful degradation is built in:
- LLM extraction fails → Partial success (paper saved, patterns retried)
- Embedding generation fails → Graceful degradation (patterns saved, embeddings retried)
- Qdrant connection fails → Continue with relational data only
- Context overflow during query → Truncate gracefully with warning
The lesson: Production‑ready means handling the 20 ways things break, not the 1 way they work.
Why Open Source
I built this for myself. It solved my research chaos. The techniques aren’t novel—structured extraction, hybrid storage, multi‑pass verification all exist in other RAG systems.
Making it OSS
- Shows how I approach production systems
- Might help others with the same problem
- Invites feedback on architecture decisions
- Opens contribution opportunities
What it demonstrates
- How to test agentic workflows (641 tests)
- How to handle errors gracefully (partial success, retries)
- How to document for different audiences (6 docs)
- How to ship, not just prototype
The Architectural Choices That Mattered
| Decision | Alternative | Why This Won |
|---|---|---|
| Structured extraction | Text chunking | Enables synthesis across papers |
| 3‑pass verification | Single‑pass extraction | Reduces hallucination 90 % |
| Claim/Evidence/Context | SRAL components | Generic, broadly applicable |
| Async throughout | Sync + threading | Cleaner code, better resource use |
| SQLite → Postgres path | Postgres from start | Simpler local setup, clear migration |
| Local‑first | Cloud‑native | Data ownership, no vendor lock‑in |
| FastAPI + Next.js | Streamlit/Gradio | Decoupled, production‑ready |
Every choice optimized for: reliability over features, clarity over cleverness, architecture over tools.
What’s Next
Research Vault is beta‑ready. The core workflow—upload, extract, review, query—works reliably. But there’s more to build:
Near‑term
- Pattern relationship detection (conflicts, agreements)
- Multi‑document synthesis reports
- Export to Obsidian/Notion
Longer‑term
- Multi‑user support
- Cloud deployment option
- Local LLM support (fully offline)
- Pattern evolution over time
But first: Get it into other people’s hands. See what breaks. Learn what matters.
Try It
The repo is live:
Getting started (≈5 min)
git clone https://github.com/aakashsharan/research-vault.git
cd research-vault
cp .env.example .env # Add your API keys
docker compose up --build
Open the app and upload your first paper.
The Takeaway
RAG systems exist. Structured extraction exists. LangGraph projects exist.
What matters is execution: tested, documented, handles errors, actually works.
- This isn’t groundbreaking. It’s just well‑built.
- Tools change. Execution matters.
Ship things that work. Document what you learn. Share what helps.
Related Reading
- SRAL Framework Paper – (evaluation framework for agentic AI)
- SRAL Github –
- Architecture Documentation –
- API Specification –