SAEs are early enough that there’s tons of low hanging fruit and ideas to try. They also require relatively little compute (often around $1 for a training run), so AI agents could afford to test many ideas. I wouldn’t be surprised if SAE improvements were a good early target for automated AI research, especially if the feedback loop is just “Come up with idea, modify existing loss function, train, evaluate, get a quantitative result”.
They also require relatively little compute (often around $1 for a training run), so AI agents could afford to test many ideas.
Ok, this seems surprisingly cheap. Can you say more about what such a 1$ training run typically looks like (what the hyperparameters are)? I’d also be very interested in any analysis about how SAE (computational) training costs scale vs. base LLM pretraining costs.
I wouldn’t be surprised if SAE improvements were a good early target for automated AI research, especially if the feedback loop is just “Come up with idea, modify existing loss function, train, evaluate, get a quantitative result”.
A $1 training run would be training 6 SAEs across 6 sparsities at 16K width on Gemma-2-2B for 200M tokens. This includes generating the activations, and it would be cheaper if the activations are precomputed. In practice this seems like large enough scale to validate ideas such as the Matryoshka SAE or the BatchTopK SAE.
SAEs are early enough that there’s tons of low hanging fruit and ideas to try. They also require relatively little compute (often around $1 for a training run), so AI agents could afford to test many ideas. I wouldn’t be surprised if SAE improvements were a good early target for automated AI research, especially if the feedback loop is just “Come up with idea, modify existing loss function, train, evaluate, get a quantitative result”.
Ok, this seems surprisingly cheap. Can you say more about what such a 1$ training run typically looks like (what the hyperparameters are)? I’d also be very interested in any analysis about how SAE (computational) training costs scale vs. base LLM pretraining costs.
This sounds spiritually quite similar to what’s already been done in Discovering Preference Optimization Algorithms with and for Large Language Models and I’d expect something roughly like that to probably produce something interestin, especially if a training run only cost $1.
A $1 training run would be training 6 SAEs across 6 sparsities at 16K width on Gemma-2-2B for 200M tokens. This includes generating the activations, and it would be cheaper if the activations are precomputed. In practice this seems like large enough scale to validate ideas such as the Matryoshka SAE or the BatchTopK SAE.
Yeah, if you’re doing this, you should definitely pre compute and save activations