How AI Knows a Cat Is Like a Dog: An Intuitive Guide to Word Embeddings

Published: 6 days ago (December 22, 2025 at 03:53 AM EST)

2 min read

Source: Dev.to

Intuition Behind Word Embeddings

Imagine doing math with ideas. A classic example:

King - Man + Woman ≈ Queen

This illustrates the power of static embeddings such as GloVe (Global Vectors for Word Representation). GloVe scans massive corpora, counts how often words appear near each other, and assigns each word a fixed numerical vector. Because these vectors capture “meaning,” semantically similar words end up close together.

The Polysemy Problem

Static models struggle with polysemy—words that have multiple meanings.

I need to go to the bank to deposit some money. (financial institution)
We sat on the bank of the river. (river edge)

In a static model like GloVe, “bank” has a single vector that averages across all contexts, so it cannot distinguish these senses.

Dynamic (Contextual) Embeddings

Contextual embeddings (e.g., BERT – Bidirectional Encoder Representations from Transformers) generate a unique vector for each word occurrence by looking at the entire sentence. When BERT processes the two “bank” sentences, it produces distinct vectors because the surrounding words (“river,” “deposit”) guide its interpretation.

Simple BERT Usage with PyTorch

import torch
from transformers import BertTokenizer, BertModel

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('./bert_model')
model_bert = BertModel.from_pretrained('./bert_model')
model_bert.eval()  # Prevent training

def print_token_embeddings(sentence: str, label: str):
    """
    Tokenizes a sentence, runs it through BERT,
    and prints the first 5 values of each token's embedding.
    """
    inputs = tokenizer(sentence, return_tensors="pt")
    with torch.no_grad():
        outputs = model_bert(**inputs)
    embeddings = outputs.last_hidden_state[0]
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

    print(f"\n--- {label} ---")
    for token, vector in zip(tokens, embeddings):
        print(f"{token}: {vector[:5].tolist()}")

Discussion Prompt: If you were training a model from scratch today, what specific vocabulary or niche topic would you want it to learn first? Share your thoughts in the comments.

All illustrations in this post were generated using DALL·E 3.

References

Devlin, J. et al. (2018). BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding.
Stanford NLP Group. GloVe: Global Vectors for Word Representation.
Spot Intelligence. GloVe Embeddings Explained.

How AI Knows a Cat Is Like a Dog: An Intuitive Guide to Word Embeddings

Intuition Behind Word Embeddings

The Polysemy Problem

Dynamic (Contextual) Embeddings

Simple BERT Usage with PyTorch

References

Related posts

[2025 Guide] AI-Driven Advertising Solutions for Marketing Automation

The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

[Paper] Optimizing Decoding Paths in Masked Diffusion Models by Quantifying Uncertainty

[Paper] C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling