Azure AI Search Advanced RAG with Terraform: Hybrid Search, Semantic Ranking, and Agentic Retrieval 🧠
Source: Dev.to
Hybrid Search & Agentic Retrieval in Azure AI Search
Vector search alone leaves relevance on the table.
Hybrid search with semantic ranking, chunking strategies, metadata filtering, strictness tuning, and the new agentic retrieval pipeline turns Azure AI Search into a production RAG system – all wired through Terraform.
Recap (RAG Post 1)
- Deployed Azure AI Search with a basic index and connected it to Azure OpenAI.
- Retrieval quality was mediocre – users asked nuanced questions and got incomplete or irrelevant answers.
Fix: Not a better generation model, but better retrieval.
Azure AI Search offers the most sophisticated built‑in retrieval pipeline of the three major clouds:
- Hybrid search – BM25 keyword matching + vector similarity via Reciprocal Rank Fusion (RRF).
- Transformer‑based semantic ranker – deep re‑scoring of the top results.
- Metadata filtering & strictness controls.
- Agentic retrieval – automatically decomposes complex queries.
⚠️ Important: Azure OpenAI “On‑Your‑Data” is deprecated. Microsoft recommends migrating to Foundry Agent Service with Foundry IQ. The patterns below use direct Azure AI Search integration, which works with both the current and future architecture.
Chunking Strategies
Azure AI Search supports two chunking approaches via its indexer pipeline:
| Approach | How it works | Pros | Cons |
|---|---|---|---|
| Fixed‑size chunking (Text Split skill) | Splits by token or character count with configurable overlap. | Simple, predictable, cost‑effective. | Ignores document structure. |
| Structure‑aware chunking (Document Layout skill) | Uses Azure Document Intelligence to recognize headers, sections, and layout elements. | Preserves document hierarchy. | Adds per‑page processing cost. |
Recommended Chunk Sizes (Microsoft benchmark)
| Chunk Size | Overlap | Best For | Trade‑off |
|---|---|---|---|
| 256 tokens | 25 % | Short FAQs, Q&A pairs | High precision, less context |
| 512 tokens | 25 % | General documents (default) | Best balance of precision & context |
| 1024 tokens | 10‑15 % | Long technical/legal docs | More context, risk of noise |
Key insight: Preserve sentence boundaries when chunking. Splitting mid‑sentence degrades embedding quality and retrieval accuracy.
Layered Retrieval Architecture
1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery
search_client = SearchClient(
endpoint=search_endpoint,
index_name="company-docs",
credential=credential,
)
vector_query = VectorizableTextQuery(
text="What are the penalties for late delivery?",
k_nearest_neighbors=50,
fields="contentVector",
)
results = search_client.search(
search_text="penalties late delivery", # BM25 keyword query
vector_queries=[vector_query], # Vector query
select=["title", "content", "source"],
top=50,
)
- Why both?
- Vector handles synonyms & paraphrasing.
- Keyword catches product codes, policy numbers, exact terminology.
- RRF merges the two result sets, delivering the best of both worlds.
2️⃣ Semantic Ranker (Transformer Re‑scoring)
results = search_client.search(
search_text="penalties late delivery",
vector_queries=[vector_query],
query_type="semantic",
semantic_configuration_name="default",
top=50,
)
for result in results:
print(f"Score: {result['@search.reranker_score']:.2f} - {result['title']}")
- The ranker produces a calibrated score 0 → 4 (irrelevant → excellent) by applying cross‑attention between query and document text.
3️⃣ Terraform Configuration for Semantic Ranking
resource "azurerm_search_service" "this" {
name = "${var.environment}-${var.project}-search"
resource_group_name = azurerm_resource_group.this.name
location = var.location
sku = var.search_sku
semantic_search_sku = var.semantic_search_sku
identity {
type = "SystemAssigned"
}
}
Environment variables
# environments/dev.tfvars
search_sku = "basic"
semantic_search_sku = "free" # Limited queries/month
# environments/prod.tfvars
search_sku = "standard"
semantic_search_sku = "standard" # Unlimited semantic queries
Benchmark takeaway: Hybrid + Semantic ranking consistently finds the best content at every result‑set size. Pure vector search misses many relevant hits that hybrid catches, and without semantic ranking the top‑relevant result often lands at position 7‑8 instead of 1.
Agentic Retrieval (2025‑11‑01‑preview API)
Agentic retrieval automatically decomposes complex queries into sub‑queries, runs each through the hybrid + semantic pipeline, and merges the outcomes.
Example
User: "Compare our 2024 and 2025 refund policies and highlight what changed"
Decomposition
- Sub‑query 1: “2024 refund policy terms and conditions”
- Sub‑query 2: “2025 refund policy terms and conditions”
- Sub‑query 3: “changes updates refund policy”
Each sub‑query follows:
Hybrid search → Semantic rerank top 50 → Merge
The orchestration lives in the Knowledge Base object; the underlying service, index, and semantic config remain unchanged.
When to use Agentic Retrieval
- Complex questions with multiple intents.
- Comparative queries (e.g., version‑to‑version analysis).
- Queries that span multiple document categories.
Impact: Adds latency but improves answer quality by up to 40 % on complex queries (Microsoft benchmark).
TL;DR Checklist
- ✅ Use 512‑token chunks with 25 % overlap and preserve sentence boundaries.
- ✅ Enable Hybrid search (BM25 + vector) with RRF fusion.
- ✅ Turn on Semantic ranking (Standard tier,
semantic_search_sku = "standard"). - ✅ For multi‑intent or comparative questions, switch to Agentic retrieval via the Knowledge Base API.
- ✅ Manage everything with Terraform for reproducibility across dev / prod environments.
With these patterns, Azure AI Search becomes a production‑grade RAG engine that delivers precise, context‑rich answers at scale. 🚀
Scoped Retrieval with Filterable Fields
Filters run before vector search, narrowing the candidate set:
results = search_client.search(
search_text="refund policy changes",
vector_queries=[vector_query],
query_type="semantic",
semantic_configuration_name="default",
filter="department eq 'legal' and year ge 2024",
top=20
)
Terraform note:
Filterable fields must be defined with filterable: true in the index schema. The schema is usually managed via the Portal, SDK, or REST API (not Terraform), while the search service and its SKU/capabilities are Terraform‑managed.
Azure OpenAI Data‑Source Integration
Two parameters control retrieval quality:
| Parameter | Range | Description |
|---|---|---|
| strictness | 1‑5 | How aggressively irrelevant chunks are filtered out. Higher values = stricter filtering. |
| top_n_documents | 1‑20 | Number of chunks to include in the LLM prompt after filtering & reranking. More chunks = more context (higher token cost). |
Strictness Levels
| Strictness | Behavior | Typical Use‑Case |
|---|---|---|
| 1‑2 | Lenient – includes borderline results | Exploratory questions, broad topics |
| 3 | Balanced (default) | General purpose |
| 4‑5 | Strict – only highly relevant results | Precise factual lookups, compliance |
Example Call
completion = client.chat.completions.create(
model=deployment,
messages=[{"role": "user", "content": "What changed in the refund policy?"}],
extra_body={
"data_sources": [{
"type": "azure_search",
"parameters": {
"endpoint": search_endpoint,
"index_name": "company-docs",
"query_type": "vector_semantic_hybrid",
"semantic_configuration": "default",
"strictness": 4,
"top_n_documents": 5,
"authentication": {
"type": "system_assigned_managed_identity"
}
}
}]
}
)
Tuning Guide
- Model says “I don’t have enough information” → lower
strictnessor raisetop_n_documents. - Answers contain irrelevant context → raise
strictnessor lowertop_n_documents.
Terraform Example (rag/main.tf)
resource "azurerm_search_service" "this" {
name = "${var.environment}-${var.project}-search"
resource_group_name = azurerm_resource_group.this.name
location = var.location
sku = var.search_sku
semantic_search_sku = var.semantic_search_sku
replica_count = var.search_replicas
partition_count = var.search_partitions
identity {
type = "SystemAssigned"
}
tags = var.tags
}
# Embedding model deployment
resource "azurerm_cognitive_deployment" "embedding" {
name = "text-embedding-3-small"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "text-embedding-3-small"
version = "1"
}
sku {
name = "Standard"
capacity = var.embedding_capacity
}
}
Variable Files
environments/dev.tfvars
search_sku = "basic"
semantic_search_sku = "free"
search_replicas = 1
search_partitions = 1
embedding_capacity = 30
strictness = 3
top_n_documents = 5
environments/prod.tfvars
search_sku = "standard"
semantic_search_sku = "standard"
search_replicas = 2
search_partitions = 1
embedding_capacity = 120
strictness = 4
top_n_documents = 5
Feature Comparison
| Feature | Azure AI Search | AWS Bedrock KB | GCP RAG Engine |
|---|---|---|---|
| Chunking | Fixed‑size + Document Layout skill | Fixed, hierarchical, semantic, Lambda | Fixed‑size only |
| Hybrid search | BM25 + vector via RRF (built‑in) | Supported on OpenSearch | Alpha‑weighted dense/sparse |
| Semantic reranking | Built‑in transformer ranker (L2) | Cohere Rerank | Rank Service + LLM Ranker |
| Query decomposition | Agentic retrieval (native) | Native API parameter | Not built‑in |
| Metadata filtering | Filterable index fields + OData | JSON metadata files in S3 | Filter string at query time |
| Strictness control | 1‑5 scale on data source | Not built‑in | Vector distance threshold |
| Reranker score range | 0‑4 (calibrated, cross‑query consistent) | Model‑dependent | Model‑dependent |
Azure’s advantage is the most mature retrieval pipeline – three layers (hybrid, semantic ranking, agentic) that compose together. The semantic ranker’s calibrated scoring also enables consistent quality thresholds across different indexes and query patterns.
Suggested Configurations
| Situation | Query Type | Semantic Ranker | Strictness | top_n_documents |
|---|---|---|---|---|
| Getting started | vector_simple_hybrid | Free tier | 3 | 5 |
| Production (general) | vector_semantic_hybrid | Standard | 3 | 5 |
| Precise factual lookup | vector_semantic_hybrid | Standard | 4‑5 | 3 |
| Broad research queries | vector_semantic_hybrid | Standard | 2 | 10 |
| Complex multi‑part questions | Agentic retrieval | Standard | 3 | 5 |
Recommendation: Start with vector_semantic_hybrid on the Standard tier – the default benchmarked by Microsoft. Add agentic retrieval for especially complex query patterns.
Series Context
- Post 1: Azure AI Search RAG – Basic Setup 🔍
- Post 2: Advanced RAG – Hybrid Search, Semantic Ranking, Agentic Retrieval (you’re here) 🧠
Your RAG pipeline now leverages the full Azure AI Search arsenal: hybrid search for recall, semantic ranking for precision, agentic retrieval for complex queries, metadata filtering for scope, and strictness tuning for noise control – all driven by Terraform variables per environment.
Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬