From Documents to Answers: How RAG Works

Published: 2 months ago (February 22, 2026 at 01:48 PM EST)

3 min read

Source: Dev.to

Source: Dev.to

RAG Indexing

The indexing phase converts raw documents into structured vector representations so they can be efficiently retrieved using similarity search later.

architecture diagram of RAG Indexing

1) Document ingestion and preprocessing

The first process starts with ingestion, cleaning, and converting the data into a proper format. This involves transforming raw data from the Bronze layer to the Gold layer.

RAW DATA

INTRODUCTION TO DATA SCIENCE!!!
• DATA is everywhere in today's world  
• MACHINE learning helps in prediction  
• tools like PYTHON , R , SQL are used

AFTER PREPROCESSING AND NORMALISATION

Section: Introduction to Data Science  
Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used.

2) Chunking

Chunking means breaking large text into smaller pieces so the computer can understand and search it more effectively.

Typical chunking strategies include splitting by:

Topics
Headings
Paragraphs
Recursive patterns using delimiters like \n\n and .

See more on chunking strategies.

Each chunk is stored with metadata such as chunk_id, chunk_index, etc. For large‑scale data the metadata can be saved as a JSON or Parquet file.

Example chunk.json

[
  {
    "chunk_id": "ml_intro_chunk_0001",
    "chunk_index": 0,
    "doc_id": "machine_learning_basics",
    "section": "Introduction to Machine Learning",
    "content": "What is AI, Types of Algorithms",
    "page_start": 1,
    "page_end": 1,
    "char_start": 0,
    "word_count": 6,
    "language": "en"
  }
]

3) Embeddings

The real juice lies here: all the data is converted into numbers so computers can understand its meaning.

For illustration, embedding the sentence “The dog and cat are friends” into a 3‑dimensional space:

vector representation in 3 dimension

In practice the vectors have thousands of dimensions. After embedding, each document chunk becomes a vector embedding stored in a vector database, completing the indexing phase.

RAG Query

When a user submits a query, it is first converted into an embedding using the same model used during indexing. The retrieved results are then passed to an LLM for output generation and reasoning.

Query Processing Diagram

Step 1: Convert User Query to Embedding

The user query is transformed into a vector embedding using the same model that created the document embeddings.

Step 2: Similarity Search

The query vector is compared with all stored document vectors in the vector database. Using cosine similarity, the system selects the top‑k most similar chunks and forwards them to the LLM.

Example

User asks: “What are the types of algorithms?”

The system compares the query vector with stored chunks such as:

What is AI
Types of Algorithms
History of Computers

The chunk Types of Algorithms receives the highest similarity score and is passed to the LLM.

Step 3: LLM Response Generation

The LLM receives:

The original user query
The retrieved document chunks (if any)

It appends the retrieved content to the query context and generates the final answer.

I’m currently learning more about RAG and Agentic AI step by step. If this helped you understand the pipeline better, feel free to like or follow for more as I share my journey.

From Documents to Answers: How RAG Works

RAG Indexing

1) Document ingestion and preprocessing

2) Chunking

3) Embeddings

RAG Query

Step 1: Convert User Query to Embedding

Step 2: Similarity Search

Step 3: LLM Response Generation

Related posts

Python SDK for building autonomous AI teammates

The Illusion of Digital Sovereignty: Why Vendor Swapping is Not a Compliance Strategy

Warm Introduction

Visual Studio Weekly: Copilot Memories, AI-Powered Testing, and Custom Agents