Azure AI Search Advanced RAG with Terraform: Hybrid Search, Semantic Ranking, and Agentic Retrieval 🧠

Published: (February 28, 2026 at 03:00 AM EST)
8 min read
Source: Dev.to

Source: Dev.to

Vector search alone leaves relevance on the table.
Hybrid search with semantic ranking, chunking strategies, metadata filtering, strictness tuning, and the new agentic retrieval pipeline turns Azure AI Search into a production RAG system – all wired through Terraform.

Recap (RAG Post 1)

  • Deployed Azure AI Search with a basic index and connected it to Azure OpenAI.
  • Retrieval quality was mediocre – users asked nuanced questions and got incomplete or irrelevant answers.

Fix: Not a better generation model, but better retrieval.

Azure AI Search offers the most sophisticated built‑in retrieval pipeline of the three major clouds:

  1. Hybrid search – BM25 keyword matching + vector similarity via Reciprocal Rank Fusion (RRF).
  2. Transformer‑based semantic ranker – deep re‑scoring of the top results.
  3. Metadata filtering & strictness controls.
  4. Agentic retrieval – automatically decomposes complex queries.

⚠️ Important: Azure OpenAI “On‑Your‑Data” is deprecated. Microsoft recommends migrating to Foundry Agent Service with Foundry IQ. The patterns below use direct Azure AI Search integration, which works with both the current and future architecture.

Chunking Strategies

Azure AI Search supports two chunking approaches via its indexer pipeline:

ApproachHow it worksProsCons
Fixed‑size chunking (Text Split skill)Splits by token or character count with configurable overlap.Simple, predictable, cost‑effective.Ignores document structure.
Structure‑aware chunking (Document Layout skill)Uses Azure Document Intelligence to recognize headers, sections, and layout elements.Preserves document hierarchy.Adds per‑page processing cost.
Chunk SizeOverlapBest ForTrade‑off
256 tokens25 %Short FAQs, Q&A pairsHigh precision, less context
512 tokens25 %General documents (default)Best balance of precision & context
1024 tokens10‑15 %Long technical/legal docsMore context, risk of noise

Key insight: Preserve sentence boundaries when chunking. Splitting mid‑sentence degrades embedding quality and retrieval accuracy.

Layered Retrieval Architecture

1️⃣ Hybrid Search (BM25 + Vector) + RRF Fusion

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

search_client = SearchClient(
    endpoint=search_endpoint,
    index_name="company-docs",
    credential=credential,
)

vector_query = VectorizableTextQuery(
    text="What are the penalties for late delivery?",
    k_nearest_neighbors=50,
    fields="contentVector",
)

results = search_client.search(
    search_text="penalties late delivery",   # BM25 keyword query
    vector_queries=[vector_query],          # Vector query
    select=["title", "content", "source"],
    top=50,
)
  • Why both?
    • Vector handles synonyms & paraphrasing.
    • Keyword catches product codes, policy numbers, exact terminology.
    • RRF merges the two result sets, delivering the best of both worlds.

2️⃣ Semantic Ranker (Transformer Re‑scoring)

results = search_client.search(
    search_text="penalties late delivery",
    vector_queries=[vector_query],
    query_type="semantic",
    semantic_configuration_name="default",
    top=50,
)

for result in results:
    print(f"Score: {result['@search.reranker_score']:.2f} - {result['title']}")
  • The ranker produces a calibrated score 0 → 4 (irrelevant → excellent) by applying cross‑attention between query and document text.

3️⃣ Terraform Configuration for Semantic Ranking

resource "azurerm_search_service" "this" {
  name                = "${var.environment}-${var.project}-search"
  resource_group_name = azurerm_resource_group.this.name
  location            = var.location
  sku                 = var.search_sku

  semantic_search_sku = var.semantic_search_sku

  identity {
    type = "SystemAssigned"
  }
}

Environment variables

# environments/dev.tfvars
search_sku          = "basic"
semantic_search_sku = "free"    # Limited queries/month

# environments/prod.tfvars
search_sku          = "standard"
semantic_search_sku = "standard" # Unlimited semantic queries

Benchmark takeaway: Hybrid + Semantic ranking consistently finds the best content at every result‑set size. Pure vector search misses many relevant hits that hybrid catches, and without semantic ranking the top‑relevant result often lands at position 7‑8 instead of 1.

Agentic Retrieval (2025‑11‑01‑preview API)

Agentic retrieval automatically decomposes complex queries into sub‑queries, runs each through the hybrid + semantic pipeline, and merges the outcomes.

Example

User: "Compare our 2024 and 2025 refund policies and highlight what changed"

Decomposition

  1. Sub‑query 1: “2024 refund policy terms and conditions”
  2. Sub‑query 2: “2025 refund policy terms and conditions”
  3. Sub‑query 3: “changes updates refund policy”

Each sub‑query follows:

Hybrid search → Semantic rerank top 50 → Merge

The orchestration lives in the Knowledge Base object; the underlying service, index, and semantic config remain unchanged.

When to use Agentic Retrieval

  • Complex questions with multiple intents.
  • Comparative queries (e.g., version‑to‑version analysis).
  • Queries that span multiple document categories.

Impact: Adds latency but improves answer quality by up to 40 % on complex queries (Microsoft benchmark).

TL;DR Checklist

  • ✅ Use 512‑token chunks with 25 % overlap and preserve sentence boundaries.
  • ✅ Enable Hybrid search (BM25 + vector) with RRF fusion.
  • ✅ Turn on Semantic ranking (Standard tier, semantic_search_sku = "standard").
  • ✅ For multi‑intent or comparative questions, switch to Agentic retrieval via the Knowledge Base API.
  • ✅ Manage everything with Terraform for reproducibility across dev / prod environments.

With these patterns, Azure AI Search becomes a production‑grade RAG engine that delivers precise, context‑rich answers at scale. 🚀

Scoped Retrieval with Filterable Fields

Filters run before vector search, narrowing the candidate set:

results = search_client.search(
    search_text="refund policy changes",
    vector_queries=[vector_query],
    query_type="semantic",
    semantic_configuration_name="default",
    filter="department eq 'legal' and year ge 2024",
    top=20
)

Terraform note:
Filterable fields must be defined with filterable: true in the index schema. The schema is usually managed via the Portal, SDK, or REST API (not Terraform), while the search service and its SKU/capabilities are Terraform‑managed.

Azure OpenAI Data‑Source Integration

Two parameters control retrieval quality:

ParameterRangeDescription
strictness1‑5How aggressively irrelevant chunks are filtered out. Higher values = stricter filtering.
top_n_documents1‑20Number of chunks to include in the LLM prompt after filtering & reranking. More chunks = more context (higher token cost).

Strictness Levels

StrictnessBehaviorTypical Use‑Case
1‑2Lenient – includes borderline resultsExploratory questions, broad topics
3Balanced (default)General purpose
4‑5Strict – only highly relevant resultsPrecise factual lookups, compliance

Example Call

completion = client.chat.completions.create(
    model=deployment,
    messages=[{"role": "user", "content": "What changed in the refund policy?"}],
    extra_body={
        "data_sources": [{
            "type": "azure_search",
            "parameters": {
                "endpoint": search_endpoint,
                "index_name": "company-docs",
                "query_type": "vector_semantic_hybrid",
                "semantic_configuration": "default",
                "strictness": 4,
                "top_n_documents": 5,
                "authentication": {
                    "type": "system_assigned_managed_identity"
                }
            }
        }]
    }
)

Tuning Guide

  • Model says “I don’t have enough information” → lower strictness or raise top_n_documents.
  • Answers contain irrelevant context → raise strictness or lower top_n_documents.

Terraform Example (rag/main.tf)

resource "azurerm_search_service" "this" {
  name                = "${var.environment}-${var.project}-search"
  resource_group_name = azurerm_resource_group.this.name
  location            = var.location
  sku                 = var.search_sku

  semantic_search_sku = var.semantic_search_sku
  replica_count       = var.search_replicas
  partition_count     = var.search_partitions

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Embedding model deployment
resource "azurerm_cognitive_deployment" "embedding" {
  name                 = "text-embedding-3-small"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "text-embedding-3-small"
    version = "1"
  }

  sku {
    name     = "Standard"
    capacity = var.embedding_capacity
  }
}

Variable Files

environments/dev.tfvars

search_sku          = "basic"
semantic_search_sku = "free"
search_replicas     = 1
search_partitions   = 1
embedding_capacity  = 30
strictness          = 3
top_n_documents     = 5

environments/prod.tfvars

search_sku          = "standard"
semantic_search_sku = "standard"
search_replicas     = 2
search_partitions   = 1
embedding_capacity  = 120
strictness          = 4
top_n_documents     = 5

Feature Comparison

FeatureAzure AI SearchAWS Bedrock KBGCP RAG Engine
ChunkingFixed‑size + Document Layout skillFixed, hierarchical, semantic, LambdaFixed‑size only
Hybrid searchBM25 + vector via RRF (built‑in)Supported on OpenSearchAlpha‑weighted dense/sparse
Semantic rerankingBuilt‑in transformer ranker (L2)Cohere RerankRank Service + LLM Ranker
Query decompositionAgentic retrieval (native)Native API parameterNot built‑in
Metadata filteringFilterable index fields + ODataJSON metadata files in S3Filter string at query time
Strictness control1‑5 scale on data sourceNot built‑inVector distance threshold
Reranker score range0‑4 (calibrated, cross‑query consistent)Model‑dependentModel‑dependent

Azure’s advantage is the most mature retrieval pipeline – three layers (hybrid, semantic ranking, agentic) that compose together. The semantic ranker’s calibrated scoring also enables consistent quality thresholds across different indexes and query patterns.

Suggested Configurations

SituationQuery TypeSemantic RankerStrictnesstop_n_documents
Getting startedvector_simple_hybridFree tier35
Production (general)vector_semantic_hybridStandard35
Precise factual lookupvector_semantic_hybridStandard4‑53
Broad research queriesvector_semantic_hybridStandard210
Complex multi‑part questionsAgentic retrievalStandard35

Recommendation: Start with vector_semantic_hybrid on the Standard tier – the default benchmarked by Microsoft. Add agentic retrieval for especially complex query patterns.

Series Context

  • Post 1: Azure AI Search RAG – Basic Setup 🔍
  • Post 2: Advanced RAG – Hybrid Search, Semantic Ranking, Agentic Retrieval (you’re here) 🧠

Your RAG pipeline now leverages the full Azure AI Search arsenal: hybrid search for recall, semantic ranking for precision, agentic retrieval for complex queries, metadata filtering for scope, and strictness tuning for noise control – all driven by Terraform variables per environment.

Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬

0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...