Connor Kissane comments on Sparse Autoencoders Work on Attention Layer Outputs

Connor Kissane 17 May 2024 8:53 UTC
1 point
1
Thanks for the comment! We always use the pre-ReLU feature activation, which is equal to the post-ReLU activation (given that the feature is activate), and is purely linear function of z. Edited the post for clarity.
- Ali Shehper 17 May 2024 14:14 UTC
  1 point
  0
  Parent
  I see. Thanks for the clarification!