How AI Knows a Cat Is Like a Dog: An Intuitive Guide to Word Embeddings

Published: (December 22, 2025 at 03:53 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Intuition Behind Word Embeddings

Imagine doing math with ideas. A classic example:

King - Man + Woman ≈ Queen

This illustrates the power of static embeddings such as GloVe (Global Vectors for Word Representation). GloVe scans massive corpora, counts how often words appear near each other, and assigns each word a fixed numerical vector. Because these vectors capture “meaning,” semantically similar words end up close together.

The Polysemy Problem

Static models struggle with polysemy—words that have multiple meanings.

  • I need to go to the bank to deposit some money. (financial institution)
  • We sat on the bank of the river. (river edge)

In a static model like GloVe, “bank” has a single vector that averages across all contexts, so it cannot distinguish these senses.

Dynamic (Contextual) Embeddings

Contextual embeddings (e.g., BERT – Bidirectional Encoder Representations from Transformers) generate a unique vector for each word occurrence by looking at the entire sentence. When BERT processes the two “bank” sentences, it produces distinct vectors because the surrounding words (“river,” “deposit”) guide its interpretation.

Simple BERT Usage with PyTorch

import torch
from transformers import BertTokenizer, BertModel

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('./bert_model')
model_bert = BertModel.from_pretrained('./bert_model')
model_bert.eval()  # Prevent training

def print_token_embeddings(sentence: str, label: str):
    """
    Tokenizes a sentence, runs it through BERT,
    and prints the first 5 values of each token's embedding.
    """
    inputs = tokenizer(sentence, return_tensors="pt")
    with torch.no_grad():
        outputs = model_bert(**inputs)
    embeddings = outputs.last_hidden_state[0]
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

    print(f"\n--- {label} ---")
    for token, vector in zip(tokens, embeddings):
        print(f"{token}: {vector[:5].tolist()}")

Discussion Prompt: If you were training a model from scratch today, what specific vocabulary or niche topic would you want it to learn first? Share your thoughts in the comments.

All illustrations in this post were generated using DALL·E 3.

References

  • Devlin, J. et al. (2018). BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding.
  • Stanford NLP Group. GloVe: Global Vectors for Word Representation.
  • Spot Intelligence. GloVe Embeddings Explained.
Back to Blog

Related posts

Read more »