TurnTrout comments on Basic Facts about Language Model Internals

TurnTrout 3 May 2023 0:46 UTC
LW: 2 AF: 2
0
AF
Some hypotheses about such functions could be that the outliers perform some kind of large-scale bias or normalization role, that they are ‘empty’ dimensions where attention or MLPs can write various scratch or garbage values, or that they somehow play important roles in the computation of the network.
If the outliers were garbage values, wouldn’t that predict that zero ablation doesn’t increase loss much?