在 AWS 上使用 Bedrock 和 OpenSearch 构建生产 RAG 流水线

发布: 1天前 (2026年3月9日 GMT+8 02:54)

2 分钟阅读

Source: Dev.to

RAG（检索增强生成）是企业在不进行微调的情况下部署大语言模型（LLM）的方法。大多数教程只停留在演示阶段，但生产环境下的 RAG 需要额外的考虑。

RAG 与微调与 Prompt Engineering 的对比

方法	成本	数据新鲜度	准确性	复杂度
RAG	中等	实时	高（检索效果好时）	中等
微调	高	静态（需要重新训练）	高	高
Prompt Engineering	低	静态	可变	低

架构

流水线遵循以下流程：

文档 → 切块 → 嵌入 → 向量存储 → 查询 → 检索 → LLM → 响应

Python 实现

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
opensearch = boto3.client("opensearchserverless")

def query_knowledge_base(question: str, collection_id: str) -> str:
    # Generate embedding for the question
    embed_response = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({"inputText": question})
    )
    query_embedding = json.loads(embed_response["body"].read())["embedding"]

    # Search OpenSearch vector store
    results = search_vectors(query_embedding, collection_id, k=5)
    context = "\n".join([r["text"] for r in results])

    # Generate answer with context
    prompt = f"""Based on the following context, answer the question.

Context: {context}

Question: {question}

Answer:"""

    response = bedrock.invoke_model(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 1024
        })
    )
    return json.loads(response["body"].read())["content"][0]["text"]

幻觉（Hallucination）缓解

切块大小很重要 —— 512 个 token 并且有 50 个 token 的重叠效果最佳。
混合检索 —— 结合语义检索和关键词（BM25）检索。
引用依据 —— 强制模型引用来源切块。
置信度评分 —— 过滤低相关性的检索结果（余弦相似度）。

在 AWS 上使用 Bedrock 和 OpenSearch 构建生产 RAG 流水线

RAG 与微调与 Prompt Engineering 的对比

架构

Python 实现

幻觉（Hallucination）缓解

相关文章

如何使用 Webhooks 自动化稳定币支付对账（开发者运行手册）

为什么我不再信任 AI 代理并构建了安全执行器

开发者角色，重新定义

5分钟学会 reflectt-node：从零到协同 AI 代理

RAG 与 微调 与 Prompt Engineering 的对比

架构

Python 实现

幻觉（Hallucination）缓解

相关文章

如何使用 Webhooks 自动化 稳定币 支付对账（开发者运行手册）

为什么我不再信任 AI 代理并构建了安全执行器

开发者角色，重新定义

5分钟学会 reflectt-node：从零到协同 AI 代理

RAG 与微调与 Prompt Engineering 的对比

如何使用 Webhooks 自动化稳定币支付对账（开发者运行手册）