Building Production RAG Pipelines on AWS with Bedrock and OpenSearch
Source: Dev.to
RAG (Retrieval‑Augmented Generation) is how enterprises are deploying LLMs without fine‑tuning. Most tutorials stop at the demo stage, but production RAG requires additional considerations.
RAG vs Fine‑Tuning vs Prompt Engineering
| Approach | Cost | Data Freshness | Accuracy | Complexity |
|---|---|---|---|---|
| RAG | Medium | Real‑time | High (with good retrieval) | Medium |
| Fine‑Tuning | High | Static (retraining needed) | High | High |
| Prompt Engineering | Low | Static | Variable | Low |
Architecture
The pipeline follows this flow:
Documents → Chunking → Embeddings → Vector Store → Query → Retrieval → LLM → Response
Python Implementation
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
opensearch = boto3.client("opensearchserverless")
def query_knowledge_base(question: str, collection_id: str) -> str:
# Generate embedding for the question
embed_response = bedrock.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
body=json.dumps({"inputText": question})
)
query_embedding = json.loads(embed_response["body"].read())["embedding"]
# Search OpenSearch vector store
results = search_vectors(query_embedding, collection_id, k=5)
context = "\n".join([r["text"] for r in results])
# Generate answer with context
prompt = f"""Based on the following context, answer the question.
Context: {context}
Question: {question}
Answer:"""
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 1024
})
)
return json.loads(response["body"].read())["content"][0]["text"]
Hallucination Mitigation
- Chunk size matters – 512 tokens with a 50‑token overlap works best.
- Hybrid search – combine semantic and keyword (BM25) search.
- Citation grounding – force the model to cite source chunks.
- Confidence scoring – filter low‑relevance retrievals (cosine similarity)