The third term in that. Though it was in a somewhat different context related to the weight partitioning project mentioned in the last paragraph, not SAE training.
Yes, brittle in hyperparameters. It was also just very painful to train in general. I wouldn’t straightforwardly extrapolate our experience to a standard SAE setup though, we had a lot of other things going on in that optimisation.
The third term in that. Though it was in a somewhat different context related to the weight partitioning project mentioned in the last paragraph, not SAE training.
Yes, brittle in hyperparameters. It was also just very painful to train in general. I wouldn’t straightforwardly extrapolate our experience to a standard SAE setup though, we had a lot of other things going on in that optimisation.
I see, thanks for sharing!