danwil comments on Tokenized SAEs: Infusing per-token biases.

danwil 13 Aug 2024 8:47 UTC
3 points
2
Double thanks for the extended discussion and ideas! Also interested to see what happens.
We earlier created some SAEs that completely remove the unigram directions from the encoder (e.g. old/gpt2_resid_pre_8_t8.pt).
However, a ” Golden Gate Bridge” feature individually activates on ” Golden” (plus prior context), ” Gate” (plus prior), and ” Bridge” (plus prior). Without the last-token/unigram directions these tended not to activate directly, complicating interpretability.