I Built an Offline-First Semantic Search Engine in JavaScript

Published: (December 29, 2025 at 01:57 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Introduction

Search is deceptively hard. Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services. I wanted something different:

  • Runs fully locally
  • Works in Node.js or the browser
  • Understands meaning, not just text
  • Doesn’t require standing up new infrastructure

That led me to build Simile Search — an offline‑first semantic + fuzzy search engine in JavaScript.

Core Techniques

Simile combines multiple techniques instead of relying on a single scoring method:

  • Transformer‑based embeddings (via transformers.js) to capture meaning, so queries like “phone charger” → “USB‑C cable” work even when there’s no keyword overlap.
  • HNSW (Hierarchical Navigable Small World) indexing for approximate nearest‑neighbor search, providing sub‑linear search time, predictable performance as the catalog grows, and practical latency for interactive search.
  • Vector quantization to reduce memory usage while keeping similarity quality high, which matters when running inside Node.js, embedding large catalogs, or keeping everything in memory.

Performance Optimizations

Embedding is the slowest part of semantic search. Simile avoids repeating work by:

  • Caching vectors for previously seen text.
  • Allowing full snapshot save/load, restoring instantly without re‑embedding.

Scoring Blend

Semantic similarity alone isn’t enough. Simile blends:

  • Fuzzy matching (typos, partial input)
  • Exact keyword boosting (precision)
  • Normalized scoring so no method dominates unfairly

Weights can be tuned depending on your domain.

Structured Data Support

Instead of flattening data manually, Simile can search directly across nested paths, e.g.:

  • metadata.author.firstName
  • metadata.tags
  • items[0].name

This makes it practical for real product catalogs and structured data.

Ideal Use Cases

Simile works best for:

  • Product & inventory catalogs
  • Internal tools and dashboards
  • Knowledge bases
  • Autocomplete / typeahead search
  • Privacy‑first or offline‑capable apps
  • NestJS backends without extra search infrastructure

It’s not trying to replace MeiliSearch, Elastic, or large vector databases—rather, it targets small‑to‑medium datasets where meaning matters and infrastructure should stay simple.

Why Simile Exists

I kept seeing projects where:

  • A full search engine was overkill.
  • A database existed just to store an index.
  • Fuzzy search wasn’t good enough.
  • Semantic search required too much setup.

Simile is an attempt to close that gap.

Installation & Source

  • npm:
  • GitHub:
Back to Blog

Related posts

Read more »