Atlas Search 점수 세부 정보 (BM25 계산)

발행: 5일 전 (2025년 12월 20일 오전 07:05 GMT+9)

7 min read

I’m happy to translate the article for you, but I’ll need the full text you’d like translated (excluding the source line you’ve already provided). Could you please paste the content you want me to translate?

Revisiting “Text Search With MongoDB (BM25 TF‑IDF) and PostgreSQL”

2023년 10월에, MongoDB의 Franck Pachot(당신의 작업을 사랑합니다!)가 MongoDB와 PostgreSQL(내장 tsvector와 ParadeDB의 pg_search 확장 모두 사용)에서 텍스트 검색을 비교한 글을 발표했습니다. 전체 글을 다시 요약하지는 않겠지만, 핵심은 MongoDB가 예상대로 동작하여 이론적 계산과 일치하는 BM25 점수를 반환했다는 점입니다.

점수 세부 정보

The MongoDB Atlas Score Details documentation explains how the score is computed. Below is the test case I used (the same as in my previous blog post).

테스트 데이터

db.articles.drop();
db.articles.deleteMany({});

db.articles.insertMany([
  { description: "🍏 🍌 🍊" },                     // short, 1 🍏
  { description: "🍎 🍌 🍊" },                     // short, 1 🍎
  { description: "🍎 🍌 🍊 🍎" },                  // larger, 2 🍎
  { description: "🍎 🍌 🍊 🍊 🍊" },               // larger, 1 🍎
  { description: "🍎 🍌 🍊 🌴 🫐 🍈 🍇 🌰" },      // large, 1 🍎
  { description: "🍎 🍎 🍎 🍎 🍎 🍎" },           // large, 6 🍎
  { description: "🍎 🍌" },                       // very short, 1 🍎
  { description: "🍌 🍊 🌴 🫐 🍈 🍇 🌰 🍎" },      // large, 1 🍎
  { description: "🍎 🍎 🍌 🍌 🍌" }               // shorter, 2 🍎
]);

db.articles.createSearchIndex("default", {
  mappings: { dynamic: true }
});

점수 세부 정보와 함께 쿼리

db.articles.aggregate([
  {
    $search: {
      text: { query: ["🍎", "🍏"], path: "description" },
      index: "default",
      scoreDetails: true
    }
  },
  {
    $project: {
      _id: 0,
      description: 1,
      score: { $meta: "searchScore" },
      scoreDetails: { $meta: "searchScoreDetails" }
    }
  },
  { $sort: { score: -1 } },
  { $limit: 1 }
]);

결과

[
  {
    "description": "🍏 🍌 🍊",
    "score": 1.0242118835449219,
    "scoreDetails": {
      "value": 1.0242118835449219,
      "description": "sum of:",
      "details": [
        {
          "value": 1.0242118835449219,
          "description": "$type:string/description:🍏 [BM25Similarity], result of:",
          "details": [
            {
              "value": 1.0242118835449219,
              "description": "score(freq=1.0), computed as boost * idf * tf from:",
              "details": [
                {
                  "value": 1.8971199989318848,
                  "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details": [
                    { "value": 1, "description": "n, number of documents containing term", "details": [] },
                    { "value": 9, "description": "N, total number of documents with field", "details": [] }
                  ]
                },
                {
                  "value": 0.5398772954940796,
                  "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details": [
                    { "value": 1, "description": "freq, occurrences of term within document", "details": [] },
                    { "value": 1.2000000476837158, "description": "k1, term saturation parameter", "details": [] },
                    { "value": 0.75, "description": "b, length normalization parameter", "details": [] },
                    { "value": 3, "description": "dl, length of field", "details": [] },
                    { "value": 4.888888835906982, "description": "avgdl, average length of field", "details": [] }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  }
]

Observations

MongoDB Atlas는 동일한 쿼리와 데이터셋에 대해 Elasticsearch와 ParadeDB가 생성한 점수보다 대략 2.2배 낮은 BM25 점수를 반환합니다.
상세 분석에 따르면 idf와 tf 구성 요소는 올바르게 계산되고 있습니다; 차이는 최종 곱셈 단계(예: Atlas 내부에서 적용되는 다른 부스트 팩터 또는 정규화 단계)에서 발생하는 것으로 보입니다.

다음 단계

Validate boost settings – 인덱스나 쿼리에 숨겨진 boost가 적용되지 않았는지 확인합니다.
Compare raw term frequencies – freq, dl, avgdl이 엔진 간에 일치하는지 확인합니다.
Reach out to MongoDB support – 스케일링 요인을 조사하기 위해 상세 점수 분해 정보를 공유합니다.

Feel free to comment or open a discussion if you have insights into why Atlas applies this scaling!

🍏 🍌 🍊에 대한 점수 세부 내역

쿼리에서 얻은 점수는 1.0242118835449219 입니다.

IDF 계산 (역문서 빈도)

검색 결과

용어를 포함하는 문서 수: n = 1
이 필드를 가진 전체 문서 수: N = 9

idf = log(1 + (N - n + 0.5) / (n + 0.5))
    = log(1 + (9 - 1 + 0.5) / (1 + 0.5))
    = log(6.666666666666667)
    ≈ 1.8971199989318848

TF 계산 (용어 빈도)

파라미터 (Lucene 기본값)

용어 포화 파라미터: k1 = 1.2000000476837158
길이 정규화 파라미터: b = 0.75

문서 필드 통계

필드 평균 길이: avgdl = 44 / 9 ≈ 4.888888835906982
문서 길이(dl): 3
해당 문서에서 용어가 등장한 횟수: freq = 1

tf = freq / (freq + k1 * (1 - b + b * dl / avgdl))
   = 1 / (1 + 1.2000000476837158 × (0.25 + 0.75 × (3 / 4.888888835906982)))
   ≈ 0.5398772954940796

최종 점수

파라미터

부스트: 1.0

score = boost × idf × tf
      = 1.0 × 1.8971199989318848 × 0.5398772954940796
      ≈ 1.0242118835449219

이는 Atlas Search가 Lucene과 동일한 점수 계산 공식을 사용한다는 것을 확인시켜 줍니다.

Elasticsearch와 Tantivy는 어떨까요?

8년 전 Lucene은 LUCENE‑8563에서 (k1 + 1) 요인을 제거했습니다.
k1 = 1.2인 경우, 이 변경으로 해당 버전 이후 점수가 ≈ 2.2 배 감소합니다.

Elasticsearch와 Tantivy는 여전히 오래된 공식( (k1 + 1) 요인 포함)을 사용합니다.
Atlas Search는 업데이트된 Lucene 공식을 사용하며, 이는 관찰된 점수 차이를 설명합니다.

결론

MongoDB Atlas Search 인덱스는 Lucene 인덱스와 동일한 BM25 스코어링 메커니즘을 사용합니다.
Atlas Search를 다른 Lucene‑기반 엔진(예: Elasticsearch, Tantivy)과 비교할 때 점수 차이가 대략 2.2배 정도일 수 있습니다.
이 차이는 결과 순서에 영향을 주지 않습니다—점수는 순위 매기기에만 사용되며, 상대적인 순서는 동일하게 유지됩니다.

텍스트‑검색 점수는 결정적이며 오픈‑소스 공식에 기반합니다. MongoDB에서는 검색 쿼리에서 점수 세부 정보를 요청하여 주어진 점수를 만든 모든 매개변수와 계산을 확인할 수 있습니다:

// Original snippet (kept for reference)
[
  // …
]