Accelerating Solr Search for Global Financial Institutions

Published: 1 month ago (March 9, 2026 at 07:20 AM EDT)

3 min read

Source: Dev.to

Source: Dev.to

When a global financial institution processes thousands of multi‑lingual regulatory reports, public development indicators, and economic whitepapers daily, the search experience cannot be purely functional—it must be instantaneous and exact.

In a recent architecture overhaul for a major international development organization running Drupal, we encountered a critical performance bottleneck: the primary Apache Solr search cluster was buckling under the weight of complex faceted queries and multi‑lingual taxonomy translations. Query execution times were spiking over 3 seconds during peak traffic hours, rendering the “Knowledge Center” unusable for researchers.

graph LR
    A[User Query] --> B[Decoupled Search UI]
    B -->|Async Request| C[Edge Cache / Memcache]
    C -- "Miss" --> D[Apache Solr Cluster]
    D -->|Faceted Results| C
    C -- "Hit" --> B
    E[Drupal Origin] -->|Flattened Sync| D

The Architecture Bottleneck

The existing implementation utilized Drupal’s search_api_solr module in a standard configuration. While adequate for smaller datasets, it failed at enterprise scale due to:

Over‑indexing: Every node revision and paragraph entity was being indexed unnecessarily, ballooning the index size.
Inefficient Faceting: Calculating distinct facet counts across millions of records in real‑time without caching.
Language Fallbacks: Solr struggled with complex edge n‑gram tokenization across English, Spanish, Portuguese, and French simultaneously.

The Optimization Strategy

1. Index Pruning and Entity Resolution

We audited the search_api index configuration. By writing custom data alterations, we stripped out non‑essential metadata.

/**
 * Prune unnecessary Paragraph entities from Solr Index.
 */
function my_module_search_api_index_items_alter(IndexInterface $index, array &$items) {
  foreach ($items as $item_id => $item) {
    // Drop revision history and administrative tracking fields from the index
    $item->setField('revision_log', []);
    $item->setField('field_internal_admin_notes', []);

    // Flatten nested locations into a single multi‑value string 
    // to avoid Solr Join queries
    $flattened = $item->getFieldValue('field_location_coordinates');
    // ... logic to simplify data ...
  }
}

Faceted search dynamically filters results. Calculating these counts on every keystroke destroys database performance. We implemented aggressive facet caching using Memcached.

// Backend implementation of Facet Result Caching
public function getCachedFacets($query_hash) {
  $cache = \Drupal::cache('search_api_facets')->get($query_hash);
  if ($cache) {
    return $cache->data; // Instant sub‑millisecond return
  }

  $results = $this->calculateFacetsFromSolr();
  \Drupal::cache('search_api_facets')->set($query_hash, $results, Cache::PERMANENT, ['search_api_list']);
  return $results;
}

3. Semantic Language Tokenization

To handle English, Spanish, and French in a single index without quality loss, we replaced standard tokenizers with language‑specific stemming and stop‑word filters within schema.xml. This ensures that a search for “investición” correctly maps to “invest” in Spanish without polluting the English search results for the same stem.

The Outcome

Following deployment to production:

Query Time: Average search response time dropped from ~3,200 ms to 140 ms.
Infrastructure Costs: We were able to scale down the Solr cluster by one tier, saving significant monthly OPEX.

Originally published at VictorStack AI — Drupal & WordPress Reference.

Accelerating Solr Search for Global Financial Institutions

The Architecture Bottleneck

The Optimization Strategy

1. Index Pruning and Entity Resolution

2. Memcache‑Backed Facet Caching

3. Semantic Language Tokenization

The Outcome

Related posts

Your Undo Button is Just a Stack of Pancakes

Proxy Bandwidth Optimization: Cut Costs Without Sacrificing Performance

Modernizing Workflows with Pre-commit Hooks in Drupal

Background Jobs in Production: The Problems Queues Don’t Solve