The Life of a Search Query in OpenSearch

Published: 3 days ago (May 1, 2026 at 09:10 AM EDT)

4 min read

Source: Dev.to

Overview

A search request typically looks like:

GET /my-index/_search
Content-Type: application/json

{
  "query": { "match": { "title": "opensearch" } },
  "size": 10,
  "from": 0,
  "_source": true
}

The client can be anything from curl to a Python SDK; the wire format is always JSON over HTTP.

HTTP Request and Coordinating Node

HTTP server – OpenSearch runs a lightweight HTTP server that parses the request.
Coordinating node – The node that receives the request becomes the coordinating node. Any node in the cluster can act as the coordinator; it does not need to store data.

Sharding and Routing

Data is stored in shards – Lucene indexes distributed across the cluster.
Each document is assigned to a primary shard based on a routing value (default: the document _id). The routing formula is:

hash(routing) % number_of_primary_shards

The coordinating node runs this hash function to determine which primary shards are responsible for the query.
For a simple term query across a single index, the coordinator may need to contact all primary shards of that index. Supplying a specific routing value can dramatically reduce the shard set and improve latency.

Query Execution on Shards

Once the responsible shards are known, the coordinator forwards the query to the shard nodes (which may be the same physical node or a different one). Each shard executes the query locally against its Lucene segments.

Segment Search

Lucene stores data in immutable segments. During the query phase, each segment is searched independently.
OpenSearch can search segments in parallel within a shard – a feature called concurrent segment search (introduced in version 3.0). The engine automatically decides how many slices to create based on CPU cores and segment size.

Scoring (BM25)

For each matching document, Lucene computes a relevance score using the BM25 algorithm.
Key parameters: term frequency, inverse document frequency, and the length‑normalisation factor b (default 0.75).
Each shard returns the top‑k (default 10) documents together with their scores.

Fetch Phase

The query phase only returns document IDs and scores. If the client requested the _source field (as most do), a second round called the fetch phase runs:

The coordinating node asks each shard for the full source of the selected documents.
Shards retrieve the stored _source from Lucene’s stored fields and send it back.

Because the fetch phase may move larger payloads over the network, OpenSearch tries to keep the number of fetched documents small. Pagination (from/size) and stored_fields filters are important performance knobs.

Merging Results

After receiving the top‑k results from each shard, the coordinating node:

Merges them into a single ranked list.
Re‑applies the global size and from parameters.
Sorts the combined set based on the BM25 scores returned by each shard.
Applies any custom sort rules specified in the query.

The final merged list is formatted as a JSON response and sent back to the client.

Near‑Real‑Time Indexing

While a query is being processed, OpenSearch maintains a near‑real‑time view of the data:

New documents are first written to an in‑memory buffer and appended to the translog for durability.
Every second (default index.refresh_interval), the buffer is flushed to a new Lucene segment, making the freshly indexed documents searchable.
This results in a typical < 1‑second lag between indexing and visibility in search results.

Common Issues and Mitigations

Issue	Why it Happens	Mitigation
Slow query latency	Too many shards queried, high segment count	Use routing, configure `index.routing_partition_size`, force‑merge to reduce segments
High CPU usage	Concurrent segment search on large shards	Tune `search.max_concurrent_shard_requests` and `search.max_concurrent_segments`
Stale results	Refresh interval too large for real‑time needs	Reduce `index.refresh_interval` on hot indices
Large payloads	Fetching full `_source` for many docs	Use `stored_fields` or `docvalue_fields`, limit `size`

Conclusion

A search query in OpenSearch is more than a simple HTTP call. It involves routing, parallel shard execution, scoring, optional fetching, and a final merge step that stitches everything together. Understanding each stage helps you design better schemas, tune performance, and avoid common pitfalls such as unnecessary shard scans or excessive refresh intervals. By visualising the journey of a query, you gain the confidence to diagnose latency issues, choose the right indexing strategies, and make the most of OpenSearch’s powerful plugin and analysis ecosystems.