Logan Riggs comments on Tokenized SAEs: Infusing per-token biases.

Logan Riggs 12 Aug 2024 14:38 UTC
3 points
0
About similar tokenized features, maybe I’m misunderstanding, but this seems like a problem for any decoder-like structure.
I didn’t mean to imply it’s a problem, but the intepretation should be different. For example, if at layer N, all the number tokens have cos-sim=1 in the tokenized-feature set, then if we find a downstream feature reading from ” 9″ token on a specific task, then we should conclude it’s reading from a more general number direction than a specific number direction.
I agree this argument also applies to the normal SAE decoder (if the cos-sim=1)