Accelerating Solr Search for Global Financial Institutions

Published: (March 9, 2026 at 07:20 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for Accelerating Solr Search for Global Financial Institutions

When a global financial institution processes thousands of multi‑lingual regulatory reports, public development indicators, and economic whitepapers daily, the search experience cannot be purely functional—it must be instantaneous and exact.

In a recent architecture overhaul for a major international development organization running Drupal, we encountered a critical performance bottleneck: the primary Apache Solr search cluster was buckling under the weight of complex faceted queries and multi‑lingual taxonomy translations. Query execution times were spiking over 3 seconds during peak traffic hours, rendering the “Knowledge Center” unusable for researchers.

graph LR
    A[User Query] --> B[Decoupled Search UI]
    B -->|Async Request| C[Edge Cache / Memcache]
    C -- "Miss" --> D[Apache Solr Cluster]
    D -->|Faceted Results| C
    C -- "Hit" --> B
    E[Drupal Origin] -->|Flattened Sync| D

The Architecture Bottleneck

The existing implementation utilized Drupal’s search_api_solr module in a standard configuration. While adequate for smaller datasets, it failed at enterprise scale due to:

  • Over‑indexing: Every node revision and paragraph entity was being indexed unnecessarily, ballooning the index size.
  • Inefficient Faceting: Calculating distinct facet counts across millions of records in real‑time without caching.
  • Language Fallbacks: Solr struggled with complex edge n‑gram tokenization across English, Spanish, Portuguese, and French simultaneously.

The Optimization Strategy

Rather than simply provisioning larger AWS instances, we implemented a software‑level optimization strategy focusing on indexing efficiency and query offloading.

1. Index Pruning and Entity Resolution

We audited the search_api index configuration. By writing custom data alterations, we stripped out non‑essential metadata.

/**
 * Prune unnecessary Paragraph entities from Solr Index.
 */
function my_module_search_api_index_items_alter(IndexInterface $index, array &$items) {
  foreach ($items as $item_id => $item) {
    // Drop revision history and administrative tracking fields from the index
    $item->setField('revision_log', []);
    $item->setField('field_internal_admin_notes', []);

    // Flatten nested locations into a single multi‑value string 
    // to avoid Solr Join queries
    $flattened = $item->getFieldValue('field_location_coordinates');
    // ... logic to simplify data ...
  }
}

2. Memcache‑Backed Facet Caching

Faceted search dynamically filters results. Calculating these counts on every keystroke destroys database performance. We implemented aggressive facet caching using Memcached.

// Backend implementation of Facet Result Caching
public function getCachedFacets($query_hash) {
  $cache = \Drupal::cache('search_api_facets')->get($query_hash);
  if ($cache) {
    return $cache->data; // Instant sub‑millisecond return
  }

  $results = $this->calculateFacetsFromSolr();
  \Drupal::cache('search_api_facets')->set($query_hash, $results, Cache::PERMANENT, ['search_api_list']);
  return $results;
}

3. Semantic Language Tokenization

To handle English, Spanish, and French in a single index without quality loss, we replaced standard tokenizers with language‑specific stemming and stop‑word filters within schema.xml. This ensures that a search for “investición” correctly maps to “invest” in Spanish without polluting the English search results for the same stem.

The Outcome

Following deployment to production:

  • Query Time: Average search response time dropped from ~3,200 ms to 140 ms.
  • Infrastructure Costs: We were able to scale down the Solr cluster by one tier, saving significant monthly OPEX.

Originally published at VictorStack AI — Drupal & WordPress Reference.

0 views
Back to Blog

Related posts

Read more »