RAG with MongoDB Vector Search PART 1

Published: (December 11, 2025 at 06:24 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

What is RAG?

Retrieval‑augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information. RAG feeds contextual data to the LLM to deliver more accurate and grounded answers.

Why Vector Search Matters in RAG?

Traditional keyword search breaks down when queries are vague, paraphrased, or semantically rich. Vector search solves this by representing text as high‑dimensional embeddings that encode meaning rather than literal wording.

Embedding

Converts documents and user queries into high‑dimensional vectors that capture their semantic meaning.

Indexing

Uses these vectors to build an approximate nearest‑neighbor (ANN) structure such as HNSW, IVF‑Flat, or PQ, enabling efficient similarity search.

Retrieval

Embeds the incoming query and compares it against the indexed vectors, returning the closest matches based on semantic similarity.

Example Scenario

Sources

  • Source 1 – “Hono is a fast, lightweight JavaScript framework built on Web Standards. It focuses on low overhead, edge‑friendly execution, and a minimal API surface.”
  • Source 2 – “Elysia is a Bun‑optimized web framework that provides strong typing, schema validation, and excellent performance. It is designed for building scalable HTTP services with good developer ergonomics.”
  • Source 3 – “Express is a minimalistic and widely adopted Node.js framework. It is commonly used to build REST APIs because of its simplicity, extensive ecosystem, and flexible middleware model.”

User query

How can I build a backend service in JavaScript, with better Bun runtime integration?

When this query is embedded, the resulting vector represents concepts such as backend service, API development, HTTP frameworks, and JavaScript server‑side technologies.

Typical embedding models (Voyage, OpenAI, HuggingFace, etc.) generate vectors between 512 and 3072 dimensions. Example vector:

[
  0.0182, -0.0925, 0.0441, 0.0107, -0.0713, 0.1234, -0.0089, 0.0562,
  -0.0041, 0.0977, 0.0229, -0.0335, 0.1412, -0.0611, 0.0054, 0.0883,
  -0.0122, 0.0745, -0.1099, 0.0671, 0.0144, -0.0528, 0.0995, -0.0173,
  0.0811, -0.0442, 0.0368, 0.1210, -0.0075, 0.0932, -0.0661, 0.0152,
  0.0473, -0.0891, 0.1329, 0.0287, -0.0174, 0.0721, -0.0554, 0.1012,
  0.0069, -0.0312, 0.1184, -0.0251, 0.0526, 0.0048, -0.0903, 0.1301,
  0.0110, -0.0782, 0.0433, 0.0271, -0.0622, 0.0999, -0.0148, 0.0711,
  0.0835, -0.0222, 0.0579, -0.0384
]

Vector search compares this query vector with the vectors generated from the sources using the index. The similarity search identifies which sources are semantically closest to the intent of the query and retrieves them.

Retrieval Result

[
  {
    "text": "Elysia is a Bun‑optimized web framework that provides strong typing, schema validation, and excellent performance. It is designed for building scalable HTTP services with good developer ergonomics.",
    "score": 0.91
  },
  {
    "text": "Express is a minimalistic and widely adopted Node.js framework. It is commonly used to build REST APIs because of its simplicity, extensive ecosystem, and flexible middleware model.",
    "score": 0.78
  },
  {
    "text": "Hono is a fast, lightweight JavaScript framework built on Web Standards. It focuses on low overhead, edge‑friendly execution, and a minimal API surface.",
    "score": 0.61
  }
]

MongoDB Atlas Vector Search brings vector similarity, metadata filtering, and document storage into a single, unified system. Instead of splitting your stack between a vector database and an operational database, Atlas lets you keep embeddings, raw documents, and application data side‑by‑side. This removes the overhead of synchronizing two systems, reduces latency, and simplifies your architecture.

For RAG pipelines, this matters: you can store the original sources, their embeddings, and any contextual metadata (tags, timestamps, access rules, versions) all in one place and query everything in a single round trip.

How MongoDB Vector Search Works

MongoDB stores your embeddings inside collections as arrays of numbers, just like any other field. When you enable vector search on that field, Atlas builds an ANN index optimized for fast semantic similarity lookup.

MongoDB Atlas Vector Search diagram

When a query comes in, Atlas uses your embedded input (usually through your app or an LLM workflow), compares the query vector against the indexed vectors, and returns the documents with the smallest distance.

Creating a Vector Search Index in MongoDB Atlas

To enable vector search in MongoDB Atlas, define a vector index on the field that stores your embeddings. This index tells Atlas how to structure the ANN graph (HNSW) and which similarity metric to use.

Index Definition

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "embedding": {
        "type": "knnVector",
        "dimensions": 1536,
        "similarity": "cosine"
      }
    }
  }
}
  • embedding – the field where each document stores its vector representation.
  • dimensions – must match the size of the embedding model you use.
  • similarity – defines how distance is calculated during retrieval (e.g., cosine).

Vector Search Query

{
  "$vectorSearch": {
    "index": "frameworks_vector_index",
    "path": "embedding",
    "queryVector": [/* query embedding values */],
    "numCandidates": 50,
    "limit": 3
  }
}

Atlas traverses the HNSW graph with the queryVector, identifies the closest nodes based on the configured similarity metric, and returns the top results.

A critical requirement: the same embedding model must be used for both stored documents and incoming queries. Mixing models or versions breaks vector compatibility and degrades similarity search.

Conclusion

This overview shows how RAG, vector search, and MongoDB Atlas fit together in a practical workflow. Stay tuned for future articles exploring deeper RAG architectures, vector indexing strategies, hybrid search, and more.

Back to Blog

Related posts

Read more »