From Documents to Answers: How RAG Works
Source: Dev.to
RAG Indexing
The indexing phase converts raw documents into structured vector representations so they can be efficiently retrieved using similarity search later.

1) Document ingestion and preprocessing
The first process starts with ingestion, cleaning, and converting the data into a proper format. This involves transforming raw data from the Bronze layer to the Gold layer.
RAW DATA
INTRODUCTION TO DATA SCIENCE!!!
• DATA is everywhere in today's world
• MACHINE learning helps in prediction
• tools like PYTHON , R , SQL are used
AFTER PREPROCESSING AND NORMALISATION
Section: Introduction to Data Science
Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used.
2) Chunking
Chunking means breaking large text into smaller pieces so the computer can understand and search it more effectively.
Typical chunking strategies include splitting by:
- Topics
- Headings
- Paragraphs
- Recursive patterns using delimiters like
\n\nand.
See more on chunking strategies.
Each chunk is stored with metadata such as chunk_id, chunk_index, etc. For large‑scale data the metadata can be saved as a JSON or Parquet file.
Example chunk.json
[
{
"chunk_id": "ml_intro_chunk_0001",
"chunk_index": 0,
"doc_id": "machine_learning_basics",
"section": "Introduction to Machine Learning",
"content": "What is AI, Types of Algorithms",
"page_start": 1,
"page_end": 1,
"char_start": 0,
"word_count": 6,
"language": "en"
}
]
3) Embeddings
The real juice lies here: all the data is converted into numbers so computers can understand its meaning.
For illustration, embedding the sentence “The dog and cat are friends” into a 3‑dimensional space:

In practice the vectors have thousands of dimensions. After embedding, each document chunk becomes a vector embedding stored in a vector database, completing the indexing phase.
RAG Query
When a user submits a query, it is first converted into an embedding using the same model used during indexing. The retrieved results are then passed to an LLM for output generation and reasoning.

Step 1: Convert User Query to Embedding
The user query is transformed into a vector embedding using the same model that created the document embeddings.
Step 2: Similarity Search
The query vector is compared with all stored document vectors in the vector database. Using cosine similarity, the system selects the top‑k most similar chunks and forwards them to the LLM.
Example
User asks: “What are the types of algorithms?”
The system compares the query vector with stored chunks such as:
- What is AI
- Types of Algorithms
- History of Computers
The chunk Types of Algorithms receives the highest similarity score and is passed to the LLM.
Step 3: LLM Response Generation
The LLM receives:
- The original user query
- The retrieved document chunks (if any)
It appends the retrieved content to the query context and generates the final answer.
I’m currently learning more about RAG and Agentic AI step by step. If this helped you understand the pipeline better, feel free to like or follow for more as I share my journey.