Ali Shehper comments on Sparse Autoencoders Work on Attention Layer Outputs

Ali Shehper 17 May 2024 3:20 UTC
1 point
0
Since the feature activation is just the dot product (plus encoder bias) of the concatenated z vector and the corresponding column of the encoder matrix, we can rewrite this as the sum of n_heads dot products, allowing us to look at the direct contribution from each head.
Nice work. But I have one comment.

The feature activation is the output of ReLU applied to this dot product plus the encoder bias, and ReLU is a non-linear function. So it is not clear that we can find the contribution of each head to the feature activation.
- Connor Kissane 17 May 2024 8:53 UTC
  1 point
  1
  Parent
  Thanks for the comment! We always use the pre-ReLU feature activation, which is equal to the post-ReLU activation (given that the feature is activate), and is purely linear function of z. Edited the post for clarity.
  - Ali Shehper 17 May 2024 14:14 UTC
    1 point
    0
    Parent
    I see. Thanks for the clarification!
- Ali Shehper 17 May 2024 3:48 UTC
  1 point
  0
  Parent
  This could also be the reason behind the issue mentioned in footnote 5.