attention

1 week ago · ai

What I Learned Trying (and Mostly Failing) to Understand Attention Heads

What I initially believed Before digging in, I implicitly believed a few things: - If an attention head consistently attends to a specific token, that token is...

#attention #transformers #language models #interpretability #machine learning #neural networks #NLP
1 month ago · ai

[Paper] Controlling changes to attention logits

Stability of neural network weights is critical when training transformer models. The query and key weights are particularly problematic, as they tend to grow l...

#attention #transformer training #learning rate scaling #model stability #research paper