I Built an Offline-First Semantic Search Engine in JavaScript
Source: Dev.to
Introduction
Search is deceptively hard. Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services. I wanted something different:
- Runs fully locally
- Works in Node.js or the browser
- Understands meaning, not just text
- Doesn’t require standing up new infrastructure
That led me to build Simile Search — an offline‑first semantic + fuzzy search engine in JavaScript.
Core Techniques
Simile combines multiple techniques instead of relying on a single scoring method:
- Transformer‑based embeddings (via
transformers.js) to capture meaning, so queries like “phone charger” → “USB‑C cable” work even when there’s no keyword overlap. - HNSW (Hierarchical Navigable Small World) indexing for approximate nearest‑neighbor search, providing sub‑linear search time, predictable performance as the catalog grows, and practical latency for interactive search.
- Vector quantization to reduce memory usage while keeping similarity quality high, which matters when running inside Node.js, embedding large catalogs, or keeping everything in memory.
Performance Optimizations
Embedding is the slowest part of semantic search. Simile avoids repeating work by:
- Caching vectors for previously seen text.
- Allowing full snapshot save/load, restoring instantly without re‑embedding.
Scoring Blend
Semantic similarity alone isn’t enough. Simile blends:
- Fuzzy matching (typos, partial input)
- Exact keyword boosting (precision)
- Normalized scoring so no method dominates unfairly
Weights can be tuned depending on your domain.
Structured Data Support
Instead of flattening data manually, Simile can search directly across nested paths, e.g.:
metadata.author.firstNamemetadata.tagsitems[0].name
This makes it practical for real product catalogs and structured data.
Ideal Use Cases
Simile works best for:
- Product & inventory catalogs
- Internal tools and dashboards
- Knowledge bases
- Autocomplete / typeahead search
- Privacy‑first or offline‑capable apps
- NestJS backends without extra search infrastructure
It’s not trying to replace MeiliSearch, Elastic, or large vector databases—rather, it targets small‑to‑medium datasets where meaning matters and infrastructure should stay simple.
Why Simile Exists
I kept seeing projects where:
- A full search engine was overkill.
- A database existed just to store an index.
- Fuzzy search wasn’t good enough.
- Semantic search required too much setup.
Simile is an attempt to close that gap.
Installation & Source
- npm:
- GitHub: