Logan Riggs comments on Taking features out of superposition with sparse autoencoders more quickly with informed initialization

Logan Riggs 24 Sep 2023 14:11 UTC
2 points
0
Therefore I would bet on performing some rare feature extraction out of batches of poorly reconstructed input data, instead of using directly the one with the worst reconstruction loss. (But may be this is what you already had in mind?)
Oh no, my idea was to do the top-sorted worse reconstructed datapoints when re-initializing (or alternatively, worse perplexity when run through the full model). Since we’ll likely be re-initializing many dead features at a time, this might pick up on the same feature multiple times.
Would you cluster & then sample uniformly from the worst-k-reconstructed clusters?
2) Not being compute bottlenecked—I do assign decent probability that we will eventually be compute bottlenecked; my point here is the current bottleneck I see is the current number of people working on it. This means, for me personally, focusing on flashy, fun applications of sparse autoencoders.
[As a relative measure, we’re not compute-bottlenecked enough to learn dictionaries in the smaller Pythia-model]