attention

1周前 · ai

我在尝试（并大多失败）理解 Attention Heads 时学到的东西

我最初的信念在深入研究之前，我隐含地相信了几件事： - 如果一个 attention head 持续关注（attend）特定的 token，那么该 token 是……

#attention #transformers #language models #interpretability #machine learning #neural networks #NLP
1个月前 · ai

[论文] 控制对注意力 logits 的更改

在训练 transformer 模型时，神经网络权重的稳定性至关重要。查询（query）和键（key）权重尤其成问题，因为它们倾向于增长……

#attention #transformer training #learning rate scaling #model stability #research paper