Yes to both! We varied expansion size for tokenized (8x-32x) and baseline (4x-64x), available in the Google Drive folder expansion-sweep
. Just to be clear, our focus was on learning so-called “complex” features that do not solely activate based on the last token. So, we did not use the lookup biases as additional features (only for decoder reconstruction).
That said, ~25% of the suggested 64x baseline features are similar to the token-biases (cosine similarity 0.4-0.9). In fact, evolving the token-biases via training substantially increases their similarity (see figure). Smaller expansion sizes have up to 66% similar features and fewer dead features. (related sections above: ‘Dead features’ and ‘Measuring “simple” features’)
Double thanks for the extended discussion and ideas! Also interested to see what happens.
We earlier created some SAEs that completely remove the unigram directions from the encoder (e.g.
old/gpt2_resid_pre_8_t8.pt
).However, a ” Golden Gate Bridge” feature individually activates on ” Golden” (plus prior context), ” Gate” (plus prior), and ” Bridge” (plus prior). Without the last-token/unigram directions these tended not to activate directly, complicating interpretability.