Azure AI Search Advanced RAG with Terraform: Hybrid Search, Semantic Ranking, and Agentic Retrieval 🧠

Published: 3 days ago (February 28, 2026 at 03:00 AM EST)

8 min read

Source: Dev.to

Hybrid Search & Agentic Retrieval in Azure AI Search

Vector search alone leaves relevance on the table.
Hybrid search with semantic ranking, chunking strategies, metadata filtering, strictness tuning, and the new agentic retrieval pipeline turns Azure AI Search into a production RAG system – all wired through Terraform.

Recap (RAG Post 1)

Deployed Azure AI Search with a basic index and connected it to Azure OpenAI.
Retrieval quality was mediocre – users asked nuanced questions and got incomplete or irrelevant answers.

Fix: Not a better generation model, but better retrieval.

Azure AI Search offers the most sophisticated built‑in retrieval pipeline of the three major clouds:

Hybrid search – BM25 keyword matching + vector similarity via Reciprocal Rank Fusion (RRF).
Transformer‑based semantic ranker – deep re‑scoring of the top results.
Metadata filtering & strictness controls.
Agentic retrieval – automatically decomposes complex queries.

⚠️ Important: Azure OpenAI “On‑Your‑Data” is deprecated. Microsoft recommends migrating to Foundry Agent Service with Foundry IQ. The patterns below use direct Azure AI Search integration, which works with both the current and future architecture.

Chunking Strategies

Azure AI Search supports two chunking approaches via its indexer pipeline:

Approach	How it works	Pros	Cons
Fixed‑size chunking (Text Split skill)	Splits by token or character count with configurable overlap.	Simple, predictable, cost‑effective.	Ignores document structure.
Structure‑aware chunking (Document Layout skill)	Uses Azure Document Intelligence to recognize headers, sections, and layout elements.	Preserves document hierarchy.	Adds per‑page processing cost.

Recommended Chunk Sizes (Microsoft benchmark)

Chunk Size	Overlap	Best For	Trade‑off
256 tokens	25 %	Short FAQs, Q&A pairs	High precision, less context
512 tokens	25 %	General documents (default)	Best balance of precision & context
1024 tokens	10‑15 %	Long technical/legal docs	More context, risk of noise

Key insight: Preserve sentence boundaries when chunking. Splitting mid‑sentence degrades embedding quality and retrieval accuracy.

Layered Retrieval Architecture

1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

search_client = SearchClient(
    endpoint=search_endpoint,
    index_name="company-docs",
    credential=credential,
)

vector_query = VectorizableTextQuery(
    text="What are the penalties for late delivery?",
    k_nearest_neighbors=50,
    fields="contentVector",
)

results = search_client.search(
    search_text="penalties late delivery",   # BM25 keyword query
    vector_queries=[vector_query],          # Vector query
    select=["title", "content", "source"],
    top=50,
)

Why both?
- Vector handles synonyms & paraphrasing.
- Keyword catches product codes, policy numbers, exact terminology.
- RRF merges the two result sets, delivering the best of both worlds.

2️⃣ Semantic Ranker (Transformer Re‑scoring)

results = search_client.search(
    search_text="penalties late delivery",
    vector_queries=[vector_query],
    query_type="semantic",
    semantic_configuration_name="default",
    top=50,
)

for result in results:
    print(f"Score: {result['@search.reranker_score']:.2f} - {result['title']}")

The ranker produces a calibrated score 0 → 4 (irrelevant → excellent) by applying cross‑attention between query and document text.

3️⃣ Terraform Configuration for Semantic Ranking

resource "azurerm_search_service" "this" {
  name                = "${var.environment}-${var.project}-search"
  resource_group_name = azurerm_resource_group.this.name
  location            = var.location
  sku                 = var.search_sku

  semantic_search_sku = var.semantic_search_sku

  identity {
    type = "SystemAssigned"
  }
}

Environment variables

# environments/dev.tfvars
search_sku          = "basic"
semantic_search_sku = "free"    # Limited queries/month

# environments/prod.tfvars
search_sku          = "standard"
semantic_search_sku = "standard" # Unlimited semantic queries

Benchmark takeaway: Hybrid + Semantic ranking consistently finds the best content at every result‑set size. Pure vector search misses many relevant hits that hybrid catches, and without semantic ranking the top‑relevant result often lands at position 7‑8 instead of 1.

Agentic Retrieval (2025‑11‑01‑preview API)

Agentic retrieval automatically decomposes complex queries into sub‑queries, runs each through the hybrid + semantic pipeline, and merges the outcomes.

Example

User: "Compare our 2024 and 2025 refund policies and highlight what changed"

Decomposition

Sub‑query 1: “2024 refund policy terms and conditions”
Sub‑query 2: “2025 refund policy terms and conditions”
Sub‑query 3: “changes updates refund policy”

Each sub‑query follows:

Hybrid search → Semantic rerank top 50 → Merge

The orchestration lives in the Knowledge Base object; the underlying service, index, and semantic config remain unchanged.

When to use Agentic Retrieval

Complex questions with multiple intents.
Comparative queries (e.g., version‑to‑version analysis).
Queries that span multiple document categories.

Impact: Adds latency but improves answer quality by up to 40 % on complex queries (Microsoft benchmark).

TL;DR Checklist

✅ Use 512‑token chunks with 25 % overlap and preserve sentence boundaries.
✅ Enable Hybrid search (BM25 + vector) with RRF fusion.
✅ Turn on Semantic ranking (Standard tier, semantic_search_sku = "standard").
✅ For multi‑intent or comparative questions, switch to Agentic retrieval via the Knowledge Base API.
✅ Manage everything with Terraform for reproducibility across dev / prod environments.

With these patterns, Azure AI Search becomes a production‑grade RAG engine that delivers precise, context‑rich answers at scale. 🚀

Scoped Retrieval with Filterable Fields

Filters run before vector search, narrowing the candidate set:

results = search_client.search(
    search_text="refund policy changes",
    vector_queries=[vector_query],
    query_type="semantic",
    semantic_configuration_name="default",
    filter="department eq 'legal' and year ge 2024",
    top=20
)

Terraform note:
Filterable fields must be defined with filterable: true in the index schema. The schema is usually managed via the Portal, SDK, or REST API (not Terraform), while the search service and its SKU/capabilities are Terraform‑managed.

Azure OpenAI Data‑Source Integration

Two parameters control retrieval quality:

Parameter	Range	Description
strictness	1‑5	How aggressively irrelevant chunks are filtered out. Higher values = stricter filtering.
top_n_documents	1‑20	Number of chunks to include in the LLM prompt after filtering & reranking. More chunks = more context (higher token cost).

Strictness Levels

Strictness	Behavior	Typical Use‑Case
1‑2	Lenient – includes borderline results	Exploratory questions, broad topics
3	Balanced (default)	General purpose
4‑5	Strict – only highly relevant results	Precise factual lookups, compliance

Example Call

completion = client.chat.completions.create(
    model=deployment,
    messages=[{"role": "user", "content": "What changed in the refund policy?"}],
    extra_body={
        "data_sources": [{
            "type": "azure_search",
            "parameters": {
                "endpoint": search_endpoint,
                "index_name": "company-docs",
                "query_type": "vector_semantic_hybrid",
                "semantic_configuration": "default",
                "strictness": 4,
                "top_n_documents": 5,
                "authentication": {
                    "type": "system_assigned_managed_identity"
                }
            }
        }]
    }
)

Tuning Guide

Model says “I don’t have enough information” → lower strictness or raise top_n_documents.
Answers contain irrelevant context → raise strictness or lower top_n_documents.

Terraform Example (`rag/main.tf`)

resource "azurerm_search_service" "this" {
  name                = "${var.environment}-${var.project}-search"
  resource_group_name = azurerm_resource_group.this.name
  location            = var.location
  sku                 = var.search_sku

  semantic_search_sku = var.semantic_search_sku
  replica_count       = var.search_replicas
  partition_count     = var.search_partitions

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Embedding model deployment
resource "azurerm_cognitive_deployment" "embedding" {
  name                 = "text-embedding-3-small"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "text-embedding-3-small"
    version = "1"
  }

  sku {
    name     = "Standard"
    capacity = var.embedding_capacity
  }
}

Variable Files

environments/dev.tfvars

search_sku          = "basic"
semantic_search_sku = "free"
search_replicas     = 1
search_partitions   = 1
embedding_capacity  = 30
strictness          = 3
top_n_documents     = 5

environments/prod.tfvars

search_sku          = "standard"
semantic_search_sku = "standard"
search_replicas     = 2
search_partitions   = 1
embedding_capacity  = 120
strictness          = 4
top_n_documents     = 5

Feature Comparison

Feature	Azure AI Search	AWS Bedrock KB	GCP RAG Engine
Chunking	Fixed‑size + Document Layout skill	Fixed, hierarchical, semantic, Lambda	Fixed‑size only
Hybrid search	BM25 + vector via RRF (built‑in)	Supported on OpenSearch	Alpha‑weighted dense/sparse
Semantic reranking	Built‑in transformer ranker (L2)	Cohere Rerank	Rank Service + LLM Ranker
Query decomposition	Agentic retrieval (native)	Native API parameter	Not built‑in
Metadata filtering	Filterable index fields + OData	JSON metadata files in S3	Filter string at query time
Strictness control	1‑5 scale on data source	Not built‑in	Vector distance threshold
Reranker score range	0‑4 (calibrated, cross‑query consistent)	Model‑dependent	Model‑dependent

Azure’s advantage is the most mature retrieval pipeline – three layers (hybrid, semantic ranking, agentic) that compose together. The semantic ranker’s calibrated scoring also enables consistent quality thresholds across different indexes and query patterns.

Suggested Configurations

Situation	Query Type	Semantic Ranker	Strictness	`top_n_documents`
Getting started	`vector_simple_hybrid`	Free tier	3	5
Production (general)	`vector_semantic_hybrid`	Standard	3	5
Precise factual lookup	`vector_semantic_hybrid`	Standard	4‑5	3
Broad research queries	`vector_semantic_hybrid`	Standard	2	10
Complex multi‑part questions	Agentic retrieval	Standard	3	5

Recommendation: Start with vector_semantic_hybrid on the Standard tier – the default benchmarked by Microsoft. Add agentic retrieval for especially complex query patterns.

Series Context

Post 1: Azure AI Search RAG – Basic Setup 🔍
Post 2: Advanced RAG – Hybrid Search, Semantic Ranking, Agentic Retrieval (you’re here) 🧠

Your RAG pipeline now leverages the full Azure AI Search arsenal: hybrid search for recall, semantic ranking for precision, agentic retrieval for complex queries, metadata filtering for scope, and strictness tuning for noise control – all driven by Terraform variables per environment.

Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬

Azure AI Search Advanced RAG with Terraform: Hybrid Search, Semantic Ranking, and Agentic Retrieval 🧠

Hybrid Search & Agentic Retrieval in Azure AI Search

Recap (RAG Post 1)

Chunking Strategies

Recommended Chunk Sizes (Microsoft benchmark)

Layered Retrieval Architecture

1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion

2️⃣ Semantic Ranker (Transformer Re‑scoring)

3️⃣ Terraform Configuration for Semantic Ranking

Environment variables

Agentic Retrieval (2025‑11‑01‑preview API)

When to use Agentic Retrieval

TL;DR Checklist

Scoped Retrieval with Filterable Fields

Azure OpenAI Data‑Source Integration

Strictness Levels

Example Call

Tuning Guide

Terraform Example (`rag/main.tf`)

Variable Files

Feature Comparison

Suggested Configurations

Series Context

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge

Hybrid Search & Agentic Retrieval in Azure AI Search

Recap (RAG Post 1)

Chunking Strategies

Recommended Chunk Sizes (Microsoft benchmark)

Layered Retrieval Architecture

1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion

2️⃣ Semantic Ranker (Transformer Re‑scoring)

3️⃣ Terraform Configuration for Semantic Ranking

Environment variables

Agentic Retrieval (2025‑11‑01‑preview API)

When to use Agentic Retrieval

TL;DR Checklist

Scoped Retrieval with Filterable Fields

Azure OpenAI Data‑Source Integration

Strictness Levels

Example Call

Tuning Guide

Terraform Example (rag/main.tf)

Variable Files

Feature Comparison

Suggested Configurations

Series Context

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge

Recap (RAG Post 1)

1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion

Terraform Example (`rag/main.tf`)