๐ Attention Explained Like You're 5
Source: Dev.to
What is Attention in AI?
Attention works like a highlighter for a language model.
When you study, you underline the parts of the text that are important for the exam and ignore the rest.
In the same way, an AI model assigns higher โattention scoresโ to words that are most relevant to understanding the current word.
Example: Disambiguating โbankโ
Sentence: โThe bank by the river had no money.โ
Without attention, an old AI might guess the meaning of bank with a 50/50 chance between:
- ๐ฐ Bank (financial institution)
- ๐๏ธ Bank (riverbank)
With attention, the model looks at surrounding words:
- bank โ โriverโ (strong connection)
- bank โ โmoneyโ (weaker connection, because the sentence says โno moneyโ)
The stronger link to โriverโ leads the model to interpret bank as a riverbank ๐๏ธ.
How Attention Scores Words
Consider the sentence:
โThe cat sat because it was tired.โ
When the model processes the pronoun it, it evaluates the relevance of each other word:
| Word | Attention score |
|---|---|
| cat | high (very relevant) |
| sat | low |
| tired | medium |
Thus the model infers that it refers to the cat.
In a more visual form:
The cat sat on mat it was tired
it: low high low - - - medium
Higher scores mean more attention, indicating greater relevance to the word being processed.
Why Attention Matters
Before attention mechanisms, models read one word at a time and quickly lost earlier context. Attention enables them to:
- Translate languages more accurately
- Understand and answer questions
- Generate coherent paragraphs
- Assist with coding tasks
By focusing on the most relevant parts of the text, attention lets AI grasp context much like a human highlights important passages.