RAG is more than Vector Search

Published: (December 12, 2025 at 02:36 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Retrieval Augmented Generation (RAG) is often associated with vector search. While that is a primary use case, any search method can be used.

  • ✅ Vector Search
  • ✅ Web Search
  • ✅ SQL Query

These examples require txtai 9.3+.

Install dependencies

Install txtai and all dependencies.

pip install txtai[pipeline-data]

# Download example SQL database
wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents

RAG with Late Interaction

The first example demonstrates RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for MUVERA and ColBERT multi‑vector ranking.

We’ll:

  1. Read the ColBERT v2 paper and extract its text into sections.
  2. Build an index with a ColBERT model.
  3. Wrap that as a Reranker pipeline using the same model.
  4. Use a RAG pipeline that leverages this retrieval method.

Note: This uses the custom ColBERT Muvera Nano model (≈970 K parameters). It’s surprisingly effective.

from txtai import Embeddings, RAG, Textractor
from txtai.pipeline import Reranker, Similarity

# Get text from ColBERT v2 paper
textractor = Textractor(sections=True, backend="docling")
data = textractor("https://arxiv.org/pdf/2112.01488")

# MUVERA fixed‑dimensional encodings
embeddings = Embeddings(
    content=True,
    path="neuml/colbert-muvera-nano",
    vectors={"trust_remote_code": True},
)
embeddings.index(data)

# Re‑rank using the same late‑interaction model
reranker = Reranker(
    embeddings,
    Similarity(
        "neuml/colbert-muvera-nano",
        lateencode=True,
        vectors={"trust_remote_code": True},
    ),
)

template = """
Answer the following question using the provided context.

Question:
{question}

Context:
{context}
"""

# RAG with late interaction models
rag = RAG(reranker, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("Write a sentence abstract about this paper", maxlength=2048))
This paper introduces ColBERTv2, a neural information retrieval model that enhances the quality and efficiency of late interaction by combining an aggressive residual compression mechanism with a denoised supervision strategy, achieving state‑of‑the‑art performance across diverse benchmarks while reducing the model's space footprint by 6–10× compared to previous methods.

Next we run a RAG pipeline that uses a web search as the retrieval method.

from smolagents import WebSearchTool

tool = WebSearchTool()

def websearch(queries, limit):
    results = []
    for query in queries:
        result = [
            {"id": i, "text": f'{x["title"]} {x["description"]}', "score": 1.0}
            for i, x in enumerate(tool.search(query))
        ]
        results.append(result[:limit])
    return results

# RAG with a websearch
rag = RAG(websearch, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("What is AI?", maxlength=2048))
Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem‑solving, perception, and decision‑making. It involves technologies like machine learning, deep learning, and natural language processing, and enables machines to simulate human‑like learning, comprehension, problem solving, decision‑making, creativity, and autonomy.

RAG with a SQL Query

The final example shows RAG with a SQL query. We’ll use the SQLite database that is part of the txtai‑wikipedia‑slim embeddings dataset.

We need to translate a natural‑language query into a SQL LIKE clause. An LLM extracts a keyword for this purpose.

import sqlite3
from txtai import LLM

def keyword(query):
    return llm(f"""
        Extract a keyword for this search query: {query}.
        Return only text with no other formatting or explanation.
    """)

def sqlsearch(queries, limit):
    results = []
    sql = "SELECT id, text FROM sections WHERE id LIKE ? LIMIT ?"

    for query in queries:
        # Extract a keyword for this search
        kw = keyword(query)

        # Run the SQL query
        results.append([
            {"id": uid, "text": text, "score": 1.0}
            for uid, text in cursor.execute(sql, [f"%{kw}%", limit])
        ])

    return results

# Load the database
cursor = sqlite3.connect("documents")

# Load the LLM
llm = LLM("Qwen/Qwen3-4B-Instruct-2507")

# RAG with a SQL query
rag = RAG(sqlsearch, llm, template=template, output="flatten")
print(rag("Tell me what happened in the 2025 World Series", maxlength=2048))
In the 2025 World Series, the Los Angeles Dodgers defeated the Toronto Blue Jays in seven games to win the championship. The series took place from October 24 to November 1 (ending early on November 2, Toronto time). Dodgers pitcher Yoshinobu Yamamoto was named the World Series MVP. The series was televised by Fox in the United States and by Sportsnet in Canada.

Wrapping up

This article showed that RAG is about much more than vector search. With txtai 9.3+, any callable method is now supported for retrieval. Enjoy!

Back to Blog

Related posts

Read more »

Building A Payment Processor Backend

Core Features - JWT Authentication & Role-Based Access Control – Token‑based auth with Admin, Merchant, and Customer roles - Idempotency Keys – Prevents duplic...