lewis smith comments on Open Source Replication & Commentary on Anthropic’s Dictionary Learning Paper

lewis smith 26 Oct 2023 8:36 UTC
1 point
0
maybe this is really naive (I just randomly thought of it), and you mention you do some obvious stuff like looking at the singular vectors of activations which might rule it out, but could the low-frequency cluster be linked something simple like the fact that the use of ReLUs, GeLUs etc. means the neuron activations are going to be biased towards the positive quadrant of the activation space in terms of magnitude (because negative components of any vector in the activation basis would be cut off). I wonder if the singular vectors would catch this.
- Neel Nanda 26 Oct 2023 9:13 UTC
  2 points
  0
  Parent
  Ah, I did compare it to the mean activations and didn’t find much, alas. Good idea though!