벡터 검색 메트릭: 왜 당신의 '거리'가 잘못됐을 수 있는가

발행: (2026년 2월 6일 오후 12:02 GMT+9)
3 min read
원문: Dev.to

Source: Dev.to

The Metric Type

The metric type defines “similarity” between the data stored in your vector database. Mathematically, it determines how we measure the distance between two vectors—whether by straight‑line distance, angular difference, or a combination of both.

Choosing the right metric depends on two factors:

  1. The type of data you are working with (text, image, audio, etc.).
  2. The embedding model you are using, which should be aligned with the data type.

The metric is set during index creation. If you pick the wrong one, you’ll be speaking a different “language” than your embedding model. Two vectors might appear close in Euclidean space but be semantically unrelated.

Below is a breakdown of the three main metric types and when to use each.

1. Cosine Similarity (The NLP Standard)

  • What it measures: The angle between two vectors.
  • What it ignores: Magnitude (length).

For text and NLP, this is usually the default. Modern dense embedding models encode semantic meaning in the direction of the vector, not its length.

Example

  • Vector A: The sentence “I love cats.”
  • Vector B: A 500‑word essay about how much the author loves cats.

Both vectors point in roughly the same direction (high cosine similarity) even though Vector B has a larger magnitude. Cosine similarity ignores length and focuses purely on direction.

Note on Sparse vs. Dense: In older sparse models (e.g., TF‑IDF), magnitude represented word frequency. In modern dense embeddings (e.g., OpenAI’s text-embedding-3), magnitude is often normalized or irrelevant to meaning.

2. L2 / Euclidean Distance (The Physical Distance)

  • What it measures: Straight‑line distance between points in space.
  • What it cares about: Both direction and magnitude.

Euclidean distance aligns with our everyday notion of “distance.” It is heavily used in computer vision and audio processing, where magnitude can carry meaningful information (e.g., pixel intensity, color depth).

3. Inner Product (IP) (The Speed Demon)

  • What it measures: Projection of one vector onto another.
  • What it cares about: Direction and magnitude.

IP is the standard for recommendation systems, where magnitude often reflects the strength of a user’s preference or item popularity. It captures both what is liked (direction) and how much it is liked (magnitude).

The “Free” Performance Hack

If your embedding model outputs normalized vectors (length = 1), Inner Product (IP) and Cosine Similarity become mathematically equivalent in terms of ranking order.

  • Why it matters: IP is computationally cheaper than Cosine because Cosine requires a division to normalize vectors during calculation. With already‑normalized vectors, you can set the metric to IP and retain the same ranking accuracy while gaining faster search performance.

TL;DR

  • Don’t leave the metric type on default.
  • Text / RAG: Check your model docs; usually Cosine.
  • Images / Audio: Likely L2 (Euclidean).
  • Normalized embeddings: Use IP for a free speed boost.
Back to Blog

관련 글

더 보기 »