leogao comments on Scaling and evaluating sparse autoencoders

leogao 9 Jun 2024 6:00 UTC
LW: 9 AF: 8
0
AF
We had done very extensive ablations at small scale where we found TopK to be consistently better than all of the alternatives we iterated through, and by the time we launched the big run we had already worked out how to scale all of the relevant hyperparameters, so we were decently confident.
One reason we might want a progressive code is it would basically let you train one autoencoder and use it for any k you wanted to at test time (which is nice because we don’t really know exactly how to set k for maximum interpretability yet). Unfortunately, this is somewhat worse than training for the specific k you want to use, so our recommendation for now is to train multiple autoencoders.
Also, even with a progressive code, the activations on the margin would not generally be negative (we actually apply a ReLU to make sure that the activations are definitely non-negative, but almost always the (k+1)th value is still substantially positive)