Atlas Search score details (the BM25 calculation)

Published: (December 19, 2025 at 05:05 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

Revisiting “Text Search With MongoDB (BM25 TF‑IDF) and PostgreSQL”

Back in October, Franck Pachot from MongoDB (love your work!) published a post comparing text search in MongoDB and PostgreSQL (using both the built‑in tsvector and ParadeDB’s pg_search extension). I won’t recap the whole article, but the key takeaway was that MongoDB behaved exactly as expected, returning BM25 scores that matched the theoretical calculation.

Score Details

The MongoDB Atlas Score Details documentation explains how the score is computed. Below is the test case I used (the same as in my previous blog post).

Test Data

db.articles.drop();
db.articles.deleteMany({});

db.articles.insertMany([
  { description: "🍏 🍌 🍊" },                     // short, 1 🍏
  { description: "🍎 🍌 🍊" },                     // short, 1 🍎
  { description: "🍎 🍌 🍊 🍎" },                  // larger, 2 🍎
  { description: "🍎 🍌 🍊 🍊 🍊" },               // larger, 1 🍎
  { description: "🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰" },      // large, 1 🍎
  { description: "🍎 🍎 🍎 🍎 🍎 🍎" },           // large, 6 🍎
  { description: "🍎 🍌" },                       // very short, 1 🍎
  { description: "🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎" },      // large, 1 🍎
  { description: "🍎 🍎 🍌 🍌 🍌" }               // shorter, 2 🍎
]);

db.articles.createSearchIndex("default", {
  mappings: { dynamic: true }
});

Query with Score Details

db.articles.aggregate([
  {
    $search: {
      text: { query: ["🍎", "🍏"], path: "description" },
      index: "default",
      scoreDetails: true
    }
  },
  {
    $project: {
      _id: 0,
      description: 1,
      score: { $meta: "searchScore" },
      scoreDetails: { $meta: "searchScoreDetails" }
    }
  },
  { $sort: { score: -1 } },
  { $limit: 1 }
]);

Result

[
  {
    "description": "🍏 🍌 🍊",
    "score": 1.0242118835449219,
    "scoreDetails": {
      "value": 1.0242118835449219,
      "description": "sum of:",
      "details": [
        {
          "value": 1.0242118835449219,
          "description": "$type:string/description:🍏 [BM25Similarity], result of:",
          "details": [
            {
              "value": 1.0242118835449219,
              "description": "score(freq=1.0), computed as boost * idf * tf from:",
              "details": [
                {
                  "value": 1.8971199989318848,
                  "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details": [
                    { "value": 1, "description": "n, number of documents containing term", "details": [] },
                    { "value": 9, "description": "N, total number of documents with field", "details": [] }
                  ]
                },
                {
                  "value": 0.5398772954940796,
                  "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details": [
                    { "value": 1, "description": "freq, occurrences of term within document", "details": [] },
                    { "value": 1.2000000476837158, "description": "k1, term saturation parameter", "details": [] },
                    { "value": 0.75, "description": "b, length normalization parameter", "details": [] },
                    { "value": 3, "description": "dl, length of field", "details": [] },
                    { "value": 4.888888835906982, "description": "avgdl, average length of field", "details": [] }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  }
]

Observations

  • MongoDB Atlas returns a BM25 score that is roughly 2.2× lower than the scores produced by Elasticsearch and ParadeDB for the same query and dataset.
  • The detailed breakdown shows that the idf and tf components are calculated correctly; the discrepancy appears to stem from the final multiplication (e.g., a different boost factor or normalization step applied internally by Atlas).

Next Steps

  1. Validate boost settings – ensure no hidden boost is applied to the index or query.
  2. Compare raw term frequencies – confirm that freq, dl, and avgdl match across engines.
  3. Reach out to MongoDB support – share the detailed score breakdown to investigate the scaling factor.

Feel free to comment or open a discussion if you have insights into why Atlas applies this scaling!

Scoring breakdown for 🍏 🍌 🍊

The query produced a score of 1.0242118835449219.

IDF calculation (inverse document frequency)

Search result

  • Number of documents containing the term: n = 1
  • Total number of documents with this field: N = 9
idf = log(1 + (N - n + 0.5) / (n + 0.5))
    = log(1 + (9 - 1 + 0.5) / (1 + 0.5))
    = log(6.666666666666667)
    ≈ 1.8971199989318848

TF calculation (term frequency)

Parameters (Lucene defaults)

  • Term‑saturation parameter: k1 = 1.2000000476837158
  • Length‑normalization parameter: b = 0.75

Document field statistics

  • Average length of the field: avgdl = 44 / 9 ≈ 4.888888835906982
  • Document length (dl): 3
  • Occurrences of the term in this document: freq = 1
tf = freq / (freq + k1 * (1 - b + b * dl / avgdl))
   = 1 / (1 + 1.2000000476837158 × (0.25 + 0.75 × (3 / 4.888888835906982)))
   ≈ 0.5398772954940796

Final score

Parameter

  • Boost: 1.0
score = boost × idf × tf
      = 1.0 × 1.8971199989318848 × 0.5398772954940796
      ≈ 1.0242118835449219

This confirms that Atlas Search uses the same scoring formula as Lucene.

What about Elasticsearch and Tantivy?

Eight years ago Lucene removed the (k1 + 1) factor in LUCENE‑8563.
For k1 = 1.2, this change reduces the score by a factor of ≈ 2.2 from that version onward.

  • Elasticsearch and Tantivy still use the older formula (with the (k1 + 1) factor).
  • Atlas Search uses the updated Lucene formula, which explains the observed scoring differences.

Conclusion

  • MongoDB Atlas Search indexes employ the same BM25 scoring mechanism as Lucene indexes.
  • When comparing Atlas Search with other Lucene‑based engines (e.g., Elasticsearch, Tantivy), you may see a score difference of roughly 2.2×.
  • This difference does not affect result ordering—scores are only used for ranking, and the relative order remains identical.

Text‑search scores are deterministic and based on open‑source formulas. In MongoDB you can request score details in a search query to see all the parameters and calculations that produced a given score:

// Original snippet (kept for reference)
[
  // …
]
Back to Blog

Related posts

Read more »