Logan Riggs comments on A small update to the Sparse Coding interim research report

Logan Riggs 1 May 2023 16:38 UTC
LW: 5 AF: 4
0
AF
As (maybe) mentioned in the slides, this method may not be computationally feasible for SOTA models, but I’m interested in the ordering of features turned monosemantic; if the most important features are turned monosemantic first, then you might not need full monosemanticity.
I initially expect the “most important & frequent” features to become monosemantic first based off the superposition paper. AFAIK, this method only captures the most frequent because “importance” would be w/ respect to CE-loss in the model output, not captured in reconstruction/L1 loss.
- Lee Sharkey 2 May 2023 23:03 UTC
  LW: 5 AF: 3
  0
  AF Parent
  I strongly suspect this is the case too!
  
  In fact, we might be able to speed up the learning of common features even further:
  
  Pierre Peigné at SERIMATS has done some interesting work that looks at initialization schemes that speed up learning. If you initialize the autoencoders with a sample of datapoints (e.g. initialize the weights with a sample from the MLP activations dataset), each of which we assume to contain a linear combination of only a few of the ground truth features, then the initial phases of feature recovery is much faster*. We haven’t had time to check, but it’s presumably biased to recover the most common features first since they’re the most likely to be in a given data point.
  
  *The ground truth feature recovery metric (MMCS) starts higher at the beginning of autoencoder training, but converges to full recovery at about the same time.