Noa Nabeshima comments on Sparse Autoencoders Work on Attention Layer Outputs

Noa Nabeshima 2 Mar 2024 1:58 UTC
LW: 2 AF: 1
0
AF
I wonder if multiple heads having the same activation pattern in a context is related to the limited rank per head; once the VO subspace of each head is saturated with meaningful directions/features maybe the model uses multiple heads to write out features that can’t be placed in the subspace of any one head.