tailcalled comments on Coordinate-Free Interpretability Theory

tailcalled 15 Sep 2022 7:46 UTC
LW: 3 AF: 2
0
AF

The data may itself have a privileged basis, but this should be lost as soon as the first linear layer is reached.

Not totally lost if the layer is e.g. a convolutional layer, because while the pixels within the convolutional window can get arbitrarily scrambled, it is not possible for a convolutional layer to scramble things across different windows in different parts of the picture.
- Jacob_Hilton 15 Sep 2022 16:03 UTC
  LW: 3 AF: 3
  2
  AF Parent
  Agreed. Likewise, in a transformer, the token dimension should maintain some relationship with the input and output tokens. This is sometimes taken for granted, but it is a good example of the data preferring a coordinate system. My remark that you quoted only really applies to the channel dimension, across which layers typically scramble everything.