Accelerating Solr Search for Global Financial Institutions
Source: Dev.to

When a global financial institution processes thousands of multi‑lingual regulatory reports, public development indicators, and economic whitepapers daily, the search experience cannot be purely functional—it must be instantaneous and exact.
In a recent architecture overhaul for a major international development organization running Drupal, we encountered a critical performance bottleneck: the primary Apache Solr search cluster was buckling under the weight of complex faceted queries and multi‑lingual taxonomy translations. Query execution times were spiking over 3 seconds during peak traffic hours, rendering the “Knowledge Center” unusable for researchers.
graph LR
A[User Query] --> B[Decoupled Search UI]
B -->|Async Request| C[Edge Cache / Memcache]
C -- "Miss" --> D[Apache Solr Cluster]
D -->|Faceted Results| C
C -- "Hit" --> B
E[Drupal Origin] -->|Flattened Sync| D
The Architecture Bottleneck
The existing implementation utilized Drupal’s search_api_solr module in a standard configuration. While adequate for smaller datasets, it failed at enterprise scale due to:
- Over‑indexing: Every node revision and paragraph entity was being indexed unnecessarily, ballooning the index size.
- Inefficient Faceting: Calculating distinct facet counts across millions of records in real‑time without caching.
- Language Fallbacks: Solr struggled with complex edge n‑gram tokenization across English, Spanish, Portuguese, and French simultaneously.
The Optimization Strategy
Rather than simply provisioning larger AWS instances, we implemented a software‑level optimization strategy focusing on indexing efficiency and query offloading.
1. Index Pruning and Entity Resolution
We audited the search_api index configuration. By writing custom data alterations, we stripped out non‑essential metadata.
/**
* Prune unnecessary Paragraph entities from Solr Index.
*/
function my_module_search_api_index_items_alter(IndexInterface $index, array &$items) {
foreach ($items as $item_id => $item) {
// Drop revision history and administrative tracking fields from the index
$item->setField('revision_log', []);
$item->setField('field_internal_admin_notes', []);
// Flatten nested locations into a single multi‑value string
// to avoid Solr Join queries
$flattened = $item->getFieldValue('field_location_coordinates');
// ... logic to simplify data ...
}
}
2. Memcache‑Backed Facet Caching
Faceted search dynamically filters results. Calculating these counts on every keystroke destroys database performance. We implemented aggressive facet caching using Memcached.
// Backend implementation of Facet Result Caching
public function getCachedFacets($query_hash) {
$cache = \Drupal::cache('search_api_facets')->get($query_hash);
if ($cache) {
return $cache->data; // Instant sub‑millisecond return
}
$results = $this->calculateFacetsFromSolr();
\Drupal::cache('search_api_facets')->set($query_hash, $results, Cache::PERMANENT, ['search_api_list']);
return $results;
}
3. Semantic Language Tokenization
To handle English, Spanish, and French in a single index without quality loss, we replaced standard tokenizers with language‑specific stemming and stop‑word filters within schema.xml. This ensures that a search for “investición” correctly maps to “invest” in Spanish without polluting the English search results for the same stem.
The Outcome
Following deployment to production:
- Query Time: Average search response time dropped from ~3,200 ms to 140 ms.
- Infrastructure Costs: We were able to scale down the Solr cluster by one tier, saving significant monthly OPEX.
Originally published at VictorStack AI — Drupal & WordPress Reference.