Maxime Riché comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

Maxime Riché 29 Nov 2022 13:30 UTC
2 points
1
Should we expect these decompositions to be even more interpretable if the model was trained to output a prediction as soon as possible? (After any block, instead of outputting the prediction after the full network)
- beren 2 Dec 2022 15:59 UTC
  1 point
  0
  Parent
  Potentially? My suspicion would be that in this case we would expect the basis in the residual stream to be extremely output-basis aligned while at the moment there is no real pressure for it to be (but it seems to be pretty output-aligned regardless, which is convenient for us). This might be a fun thing to fine-tune on.