My suggested experiment to really get at this question (which if I were in your shoes, I wouldn’t want to run cause you’ve already done quite a bit of work on this project!, lol):
Compare 1. Baseline 80x expansion (56k features) at k=30 2. Tokenized-learned 8x expansion (50k vocab + 6k features) at k=29 (since the token adds 1 extra feature)
for 300M tokens (I usually don’t see improvements past this amount) showing NMSE and CE.
If tokenized-SAEs are still better in this experiment, then that’s a pretty solid argument to use these!
If they’re equivalent, then tokenized-SAEs are still way faster to train in this lower expansion range, while having 50k “features” already interpreted.
If tokenized-SAEs are worse, then these tokenized features aren’t a good prior to use. Although both sets of features are learned, the difference would be the tokenized always has the same feature per token (duh), and baseline SAEs allow whatever combination of features (e.g. features shared across different tokens).
That’s great thanks!
My suggested experiment to really get at this question (which if I were in your shoes, I wouldn’t want to run cause you’ve already done quite a bit of work on this project!, lol):
Compare
1. Baseline 80x expansion (56k features) at k=30
2. Tokenized-learned 8x expansion (50k vocab + 6k features) at k=29 (since the token adds 1 extra feature)
for 300M tokens (I usually don’t see improvements past this amount) showing NMSE and CE.
If tokenized-SAEs are still better in this experiment, then that’s a pretty solid argument to use these!
If they’re equivalent, then tokenized-SAEs are still way faster to train in this lower expansion range, while having 50k “features” already interpreted.
If tokenized-SAEs are worse, then these tokenized features aren’t a good prior to use. Although both sets of features are learned, the difference would be the tokenized always has the same feature per token (duh), and baseline SAEs allow whatever combination of features (e.g. features shared across different tokens).
This is a completely fair suggestion. I’ll look into training a fully-fledged SAE with the same number of features for the full training duration.