I Built an Offline-First Semantic Search Engine in JavaScript

Published: 2 hours ago (December 29, 2025 at 01:57 AM EST)

2 min read

Source: Dev.to

Introduction

Search is deceptively hard. Most JavaScript search libraries stop at keywords or fuzzy matching, and most semantic search solutions assume external APIs, vector databases, or hosted services. I wanted something different:

Runs fully locally
Works in Node.js or the browser
Understands meaning, not just text
Doesn’t require standing up new infrastructure

That led me to build Simile Search — an offline‑first semantic + fuzzy search engine in JavaScript.

Core Techniques

Simile combines multiple techniques instead of relying on a single scoring method:

Transformer‑based embeddings (via transformers.js) to capture meaning, so queries like “phone charger” → “USB‑C cable” work even when there’s no keyword overlap.
HNSW (Hierarchical Navigable Small World) indexing for approximate nearest‑neighbor search, providing sub‑linear search time, predictable performance as the catalog grows, and practical latency for interactive search.
Vector quantization to reduce memory usage while keeping similarity quality high, which matters when running inside Node.js, embedding large catalogs, or keeping everything in memory.

Performance Optimizations

Embedding is the slowest part of semantic search. Simile avoids repeating work by:

Caching vectors for previously seen text.
Allowing full snapshot save/load, restoring instantly without re‑embedding.

Scoring Blend

Semantic similarity alone isn’t enough. Simile blends:

Fuzzy matching (typos, partial input)
Exact keyword boosting (precision)
Normalized scoring so no method dominates unfairly

Weights can be tuned depending on your domain.

Structured Data Support

Instead of flattening data manually, Simile can search directly across nested paths, e.g.:

metadata.author.firstName
metadata.tags
items[0].name

This makes it practical for real product catalogs and structured data.

Ideal Use Cases

Simile works best for:

Product & inventory catalogs
Internal tools and dashboards
Knowledge bases
Autocomplete / typeahead search
Privacy‑first or offline‑capable apps
NestJS backends without extra search infrastructure

It’s not trying to replace MeiliSearch, Elastic, or large vector databases—rather, it targets small‑to‑medium datasets where meaning matters and infrastructure should stay simple.

Why Simile Exists

I kept seeing projects where:

A full search engine was overkill.
A database existed just to store an index.
Fuzzy search wasn’t good enough.
Semantic search required too much setup.

Simile is an attempt to close that gap.

Installation & Source

npm:
GitHub:

I Built an Offline-First Semantic Search Engine in JavaScript

Introduction

Core Techniques

Performance Optimizations

Scoring Blend

Structured Data Support

Ideal Use Cases

Why Simile Exists

Installation & Source

Related posts

How AI Is Reshaping Diagnostics in Healthcare

🎅🎄 Happy Data-Pocalypse, Users! (Bad Advice from the IT-Grinch) 🎄🎅

Converting Text Documents into Enterprise Ready Knowledge Graphs

AIToolsJS introduces an AI-powered Invoice Data Extraction solution