Joel Burget comments on A Barebones Guide to Mechanistic Interpretability Prerequisites

Joel Burget 2 Nov 2022 23:16 UTC
1 point
0
how is changing to an orthonormal basis importantly different from just any change of basis?
What exactly do you have in mind here?
- Neel Nanda 3 Nov 2022 18:03 UTC
  3 points
  0
  Parent
  Oh, just that it preserves distance/L2 norm/angles/orthogonality. I find that this is often an important intuition, since a bunch of operations in transformers depend on orthogonality/norm. In particular, norm is useful for thinking about weight decay and as rough heuristics for eg info transferred.
  
  Probably most important is that whenever a layer tries to read a feature from the residual stream, it’s basically projecting the residual stream onto a single dimension, which hopefully corresponds to that feature. And, importantly, ignoring all orthogonal dimensions—the projection operation is invariant under rotations (orthonormal change of bases) but NOT under arbitrary change of bases)