yihe

Karma: 1

yihe 3 Jan 2025 8:22 UTC
2 points
0
on: Induction heads—illustrated
Thank you so much for the post! I’m starting to get a sense of induction heads.
Probably an unrelated question—can a single attention head store multiple orthogonal information? For example, in this post, the layer 0 may store the information “I follow ‘D’”. Can it also store information like “I am a noun”?
Or, to put it another way, should an attention head have a single, dedicated functionality?