Christopher James Hart comments on SAE reconstruction errors are (empirically) pathological

Christopher James Hart 30 Mar 2024 16:38 UTC
3 points
0
$ϵ$ -random is a bad baseline because activation space is not isotropic (or some other reason I do not understand) and this is not actually that unexpected or interesting.
Isn’t this just the answer? To rephrase:
The SAE is only able to represent a subset of the possible directions from the initial space when you force it to compress the space down.

If you take a magnitude from a direction where change matters, and then apply the magnitude to random dimensions most of which the model throws away, it will result in a smaller change.