RAG for Developers — Built for Code, Not Just Text (Review Requested)
Source: Dev.to

Code-Aware RAG Tool — Looking for Developer Feedback
We’ve been building a code‑first RAG tool that actually understands how codebases work, not just how text looks in embeddings. The goal is simple: when you ask a question, you get the right functions, related calls, and supporting code, not random nearby snippets.
What’s Inside
- AST‑based code chunking with Tree‑sitter (Python, JavaScript, TypeScript)
- Extraction of functions, classes, imports, calls, docstrings explicitly
- A clean async ingestion pipeline with strict tool → agent → storage boundaries
- Semantic vector search as the starting point, not the end
- In‑memory dependency graph expansion
- Built lazily from chunk metadata
- No persistence, no globals, no backend shortcuts
- Stable qualified IDs (
file::entity) - Context expansion via BFS over calls and imports to pull in code that’s actually connected
- Backend‑agnostic vector store layer, so storage can change without rewriting logic
Why We Think This Is Useful
- You get related code paths, not just similar text
- Context stays small, relevant, and debuggable
- The architecture avoids hidden state and scaling surprises
What We’d Love Feedback On
If you’ve worked with large repos or built RAG systems before, we’d really value your thoughts on:
- The “graph as derived state” design
- Chunk metadata choices (calls, imports, QIDs)
- Retrieval + expansion flow
- Any edge cases you think would show up in real production codebases
Even quick reactions or gut checks are welcome.