Introduction to RAG
Source: Dev.to
What is a Model?
A model is essentially an equation.
Example
y = mx + c
During training, values of x and y are provided. The model learns the appropriate values of m and c to fit the data. The values of m and c may vary depending on the use case.
What is a Parameter?
A parameter is a variable that is learned during training.
- m is a parameter
- c is a parameter
More parameters allow the model to learn more complex patterns.
What is Temperature?
Temperature controls the model’s creativity. It usually ranges from 0 to 1.
- Lower temperature → more factual answers
- Higher temperature → more imaginative answers
Temperature is passed along with the prompt input and is typically set around 0.5 for balanced output.
SLM
SLM stands for Small Language Model.
- Typically has fewer billion parameters.
- Trained for a particular domain or specific tasks.
- Training cost can still be high, similar to LLMs, depending on the use case.
Example: smallest ai – provides voice‑based smaller AI models.
LLM
LLM stands for Large Language Model.
- Usually has billions of parameters and contains knowledge from many domains.
- Considered a generalized model.
Example: gpt‑oss‑120b.
How LLM Works
The primary functionality of an LLM is to predict the next word correctly. It generates text by predicting one word after another based on previous words.
Sometimes LLMs generate incorrect information confidently; this phenomenon is called hallucination.
Example: If the model knows about cats and dogs but has limited knowledge about lions, it may generate irrelevant or incorrect content.
Hallucination can be reduced by writing proper prompts and providing correct context.
What is RAG?
RAG stands for Retrieval‑Augmented Generation.
It is a method used to provide private or external knowledge such as:
- Company policies
- HR policy documents
- Internal business documents
This information is supplied to the LLM so it can generate human‑readable answers based on that content.
Where is Private Data Stored?
Private data is usually stored in a vector database.
How Documents are Stored
Documents are split into smaller parts called chunks.
These chunks are converted into numerical vectors and stored in the vector database.
To search relevant chunks quickly, algorithms like:
- ANN (Approximate Nearest Neighbors)
- KNN (K‑Nearest Neighbors)
are commonly used.