RAG - Chunking

Published: 16 hours ago (May 10, 2026 at 11:16 PM EDT)

3 min read

Source: Dev.to

What is chunking

Chunking is the process of breaking data into smaller pieces called chunks. It happens before the data is fed into an embedding model, which converts each chunk into a vector (point) and stores the vectors in a vector database.

Why chunking matters in RAG

Data can contain different types of context while still relating to the same topic.
For example, a paragraph about the Redis database may contain multiple contexts. An embedding model such as nomic-embed-text would convert the entire paragraph into a single vector and store it in the database.

Proper chunking helps retrieve only the most relevant information and avoids unrelated content. If a chunk mixes information about both Python and Java, a query about Python might also retrieve Java‑related information because both topics exist in the same chunk. Effective chunking prevents such irrelevant retrieval.

Even an entire document can be stored as a single chunk, but the purpose of chunking is to split the data into smaller, meaningful sections so that only relevant data is retrieved for the user query while avoiding irrelevant information.

Chunking methods

Fixed chunking

Fixed chunking assigns a fixed character or token limit to every chunk.
There is no single best chunking strategy for all datasets; choosing the right chunk size usually requires a trial‑and‑error approach.

Overlapping chunking

In some cases, related information may be stored far apart in vector space due to the embedding model’s understanding, causing the LLM to miss relevant information during retrieval.
Overlapping chunking includes a portion of the previous chunk’s ending content in each new chunk, helping the embedding model place related chunks closer together in the vector database.
The purpose is to improve retrieval by making semantically related chunks easier to find.
A possible downside is that irrelevant information may also be retrieved because of the overlap.

Example
Paragraph 1 is related to Topic A. If overlapping is applied, a query about Topic B may also retrieve some information from Topic A because part of Paragraph 1 overlaps with Paragraph 2.

Semantic chunking

When two paragraphs discuss the same topic but are not strongly related, they may still be stored nearby in the vector database, making overlapping unnecessary.
Semantic chunking groups content based on meaning rather than fixed size.
Each sentence is compared with the previous chunk using a similarity threshold. If the similarity score is below the threshold, the sentence starts a new chunk.
Libraries such as NLTK can be used to implement semantic chunking, and the threshold value is configurable based on the use case.

Embedded chunking

Embedding‑based chunking uses embedding models instead of libraries like NLTK.
It calculates cosine similarity between sentences and groups semantically similar sentences into chunks.

Choosing the right chunking method

Choosing a chunking method always involves trade‑offs; there is no single strategy that works for all datasets. The best method depends on:

Dataset type – different applications may require different chunking strategies to achieve optimal RAG performance.

RAG - Chunking

What is chunking

Why chunking matters in RAG

Chunking methods

Fixed chunking

Overlapping chunking

Semantic chunking

Embedded chunking

Choosing the right chunking method

Related posts

How to Test MCP Servers Before They Break Your CI

ForgeOS Dojo - learn AI-assisted development, build something that matters

让 AI Agent 学会共享经验——我做了个'蚁群信息素'实验

The Gap Nobody Talks About :Students, Companies & The Technology Pressure