检索增强生成：将 LLM 连接到您的数据

发布: 2个月前 (2025年12月7日 GMT+8 11:00)

5 分钟阅读

Source: Dev.to

技术缩写参考

缩写	含义
API	Application Programming Interface（应用程序编程接口）
BERT	Bidirectional Encoder Representations from Transformers（双向编码器表示）
FAISS	Facebook AI Similarity Search（Facebook 人工智能相似性搜索）
GPU	Graphics Processing Unit（图形处理单元）
JSON	JavaScript Object Notation（JavaScript 对象表示法）
LLM	Large Language Model（大型语言模型）
RAG	Retrieval‑Augmented Generation（检索增强生成）
ROI	Return on Investment（投资回报率）
SQL	Structured Query Language（结构化查询语言）
VRAM	Video Random Access Memory（视频随机存取存储器）

为什么 LLM 需要外部数据

大型语言模型（LLM）有一个根本限制：它们的知识在训练时就被冻结了。

向 GPT‑4 提问：

“我们第三季度的销售额如何？” → ❌ 不知道你的数据
“员工手册里写了什么？” → ❌ 没有你的文档
“展示昨天的工单” → ❌ 没有实时访问权限
“工单 #45632 中客户说了什么？” → ❌ 看不到你的数据库

LLM 并不了解你的特定数据。

解决方案概览

方法	优点	缺点
微调	将模型定制化到你的数据	成本高、速度慢、静态
长上下文	简单的仅提示词方案	受上下文窗口限制，成本高
检索增强生成（RAG）	先检索相关数据再生成	灵活、可扩展、成本效益高

本文聚焦 RAG，这是生产系统中最实用的方法。

什么是检索增强生成（RAG）？

RAG 将 LLM 与专有数据大规模连接。它包括三个阶段：

索引（离线） – 将文档处理为向量嵌入并存入向量数据库。
检索（查询时） – 对用户查询进行嵌入，搜索向量库，返回 top‑k 最相关的片段。
生成 – 将检索到的片段与原始查询一起输入 LLM，生成最终答案。

生活中的类比：研究助理

阶段	助理的工作
索引	阅读所有公司文档，做成有序笔记，便于快速检索。
检索	当你提问时，搜索笔记并挑出最相关的文档。
生成	阅读检索到的文档，组织答案并回复。

RAG 工作流图

┌─────────────────────────────────────────────────────────┐
│                    INDEXING (Offline)                    │
├─────────────────────────────────────────────────────────┤
│ Documents → Chunking → Embeddings → Vector Database       │
│ "handbook.pdf" → paragraphs → vector representations      │
│ "policies.docx" → paragraphs → vector representations      │
│ "faqs.md"      → paragraphs → vector representations      │
└─────────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────┐
│                  RETRIEVAL (Query Time)                  │
├─────────────────────────────────────────────────────────┤
│ User Query → Embed Query → Search Vector DB → Top‑K      │
│ "What's the return policy?" → vector → find similar chunks │
│ → return 5 most relevant chunks                           │
└─────────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────┐
│                  GENERATION (Response)                   │
├─────────────────────────────────────────────────────────┤
│ Retrieved Docs + Query → LLM → Final Answer               │
│ Context: [5 relevant chunks about returns]                │
│ Question: "What is the return policy?"                    │
│ LLM Output: "Our return policy allows returns within 30   │
│ days of purchase. Items must be in original condition..." │
└─────────────────────────────────────────────────────────┘

安装

pip install langchain
pip install chromadb      # 向量数据库
pip install sentence-transformers  # 嵌入模型
pip install litellm      # LLM 接口
pip install pypdf        # PDF 处理

Python 示例：加载与切分文档

from typing import List
import re

def load_documents(file_paths: List[str]) -> List[str]:
    """Load plain‑text documents from a list of file paths."""
    documents = []
    for path in file_paths:
        with open(path, "r", encoding="utf-8") as f:
            documents.append(f.read())
    return documents

def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """
    Split `text` into overlapping chunks.

    Parameters
    ----------
    text : str
        Input text to chunk.
    chunk_size : int, default 500
        Target size of each chunk (characters).
    overlap : int, default 50
        Number of characters to overlap between consecutive chunks.
    """
    # Simple sentence‑aware chunking
    sentences = re.split(r"(? chunk_size and current_chunk:
            chunks.append(" ".join(current_chunk))

            # Preserve overlap for the next chunk
            overlap_sentences = []
            overlap_len = 0
            for s in reversed(current_chunk):
                if overlap_len + len(s)  List[dict]:
        """
        Retrieve the `top_k` most similar chunks for `query_text`.

        Returns
        -------
        List[dict] with keys `id`, `document`, `metadata`, `distance`.
        """
        query_emb = self.embedding_model.encode([query_text]).tolist()
        results = self.collection.query(
            query_embeddings=query_emb,
            n_results=top_k,
            include=["documents", "metadatas", "distances", "ids"]
        )
        # Re‑format results for easier consumption
        hits = []
        for i in range(len(results["ids"][0])):
            hits.append({
                "id": results["ids"][0][i],
                "document": results["documents"][0][i],
                "metadata": results["metadatas"][0][i],
                "distance": results["distances"][0][i],
            })
        return hits

现在你可以将切分逻辑与 VectorStore 结合，构建完整的 RAG 流程：

加载原始文档。
使用 chunk_text 将其切分。
将切分后的块插入 VectorStore。
查询时，对用户问题进行嵌入，检索 top‑k 块，并将拼接后的上下文连同原始问题一起传给你的 LLM（例如通过 litellm 或 langchain）。

本文结束。

检索增强生成：将 LLM 连接到您的数据

技术缩写参考

为什么 LLM 需要外部数据

解决方案概览

什么是检索增强生成（RAG）？

生活中的类比：研究助理

RAG 工作流图

安装

Python 示例：加载与切分文档

相关文章

🔍 Multi-Query Retriever RAG：如何显著提升您的 AI 文档检索准确性

RAG vs 微调 vs Prompt Engineering：选择正确 AI 策略的终极指南

像 HATEOAS 思考：Agentic RAG 如何动态导航知识

Chunk 边界与元数据对齐：RAG 不稳定性的隐藏根源