Implementing a RAG system: Crawl
Source: Dev.to
Introduction
I’m starting a “Crawl, walk, run” series of posts on various topics and decided to begin with Retrieval‑Augmented Generation (RAG). In this phase we’ll cover the core concepts of a RAG system and apply them in a simple example using the Government of British Columbia’s HR policy PDFs as our knowledge base. We will process, chunk, and embed the documents into a local vector database, allowing the agent to provide grounded answers directly sourced from the ingested BC government policies.
RAG is a common design pattern that turns a standard LLM into an informed AI agent. While standard models act like a “black box,” RAG gives your agent an “open‑book test,” bypassing knowledge cut‑offs by linking directly to your documents, providing factual grounding and citations. No fine‑tuning is required, and data can be updated quickly, creating a real‑time bridge between your LLM and your data.
Chunking and Splitting
It isn’t feasible to feed all the information into the AI for every query. Instead, the text is broken down into smaller, more manageable pieces called chunks, which the AI can process and retrieve efficiently.
Code: Splitting PDFs with LangChain
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = DirectoryLoader(
DATA_DIR,
glob="./**/*.pdf",
loader_cls=PyPDFLoader
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100
)
chunks = text_splitter.split_documents(docs)Recursive character chunking improves on fixed‑size chunking by ensuring overlap, preventing the loss of partial sentences at chunk boundaries.
Embedding
The embedding process transforms text chunks into vectors—arrays of floating‑point numbers that capture semantic meaning. Higher dimensionality does not always yield better results; for simple documents it can add latency and computational overhead without improving search accuracy.
Key Points
- Consistent Model: Use the same embedding model for both indexing and retrieval. Different models prioritize words differently (e.g., subject vs. action), which can affect results.
- Embedding Type Selection: Documents are usually long and structured, while user queries are short and noisy. Choose an embedding model that handles this asymmetry well.
Similarity Search
Once a user query is embedded, the RAG system performs a similarity search against the vector database to find the most relevant chunks. Most vector databases use cosine similarity, which measures the angle between vectors rather than their magnitude. This allows the system to match intent across texts of varying length and word frequency.
Agent Implementation
To handle HR questions, I built an agent using Google’s Agent Development Kit (ADK) that connects directly to the RAG system.
Code: Defining the HR Agent
from .tools import query_hr
from langchain.tools import FunctionTool
from langchain.agents import LlmAgent
hr_rag_tool = FunctionTool(func=query_hr)
hr_agent = LlmAgent(
name="hr_agent",
model="gemini-3.1-pro-preview",
description="Specialist in company HR policies and procedures.",
instruction=(
"You are a professional HR assistant. Your goal is to answer questions "
"using ONLY the information retrieved from the 'query_hr' tool. "
"When calling the 'query_hr' tool, ensure all string arguments are properly formatted as standard JSON strings with double quotes.\n\n"
"RULES:\n"
"1. If the tool returns relevant information, summarize it clearly.\n"
"2. You MUST cite your sources using the format: (Source: [Source Name], Page: [Page Number]).\n"
"3. If the tool results do not contain the answer, state: 'I'm sorry, I couldn't find that in our HR documents.'\n"
"4. Do not use outside knowledge or make up facts about company policy."
),
tools=[query_hr],
)By giving the agent clear instructions and the right tool to search the vector database, it can pull precise answers for users in seconds.
Next Steps
The prototype works, but there’s ample room for improvement. To evolve this simple RAG system into a high‑performance engine, future work will focus on:
- Refining Chunking: Experiment with different chunk sizes and overlap strategies.
- Reranking Layer: Introduce a reranking step to boost the relevance of retrieved results.
- Precision Enhancements: Optimize embedding models and similarity thresholds.
If you haven’t used the Agent Development Kit yet and want to learn more, check out the ADK Crash Course – From Beginner to Expert, which includes a link to claim free GCP credits for the course.
Repository
The full code and data are available in my GitHub repository → here.