Sam Marks comments on Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Sam Marks 9 Feb 2024 23:18 UTC
LW: 9 AF: 6
2
AF
Yep, as you say, @Logan Riggs figured out what’s going on here: you evaluated your reconstruction loss on contexts of length 128, whereas I evaluated on contexts of arbitrary length. When I restrict to context length 128, I’m able to replicate your results.
Here’s Logan’s plot for one of your dictionaries (not sure which)
and here’s my replication of Logan’s plot for your layer 1 dictionary
Interestingly, this does not happen for my dictionaries! Here’s the same plot but for my layer 1 residual stream output dictionary for pythia-70m-deduped
(Note that all three plots have a different y-axis scale.)
Why the difference? I’m not really sure. Two guesses:
1. The model: GPT2-small uses learned positional embeddings whereas Pythia models use rotary embeddings
2. The training: I train my autoencoders on variable-length sequences up to length 128; left padding is used to pad shorter sequences up to length 128. Maybe this makes a difference somehow.
In terms of standardization of which metrics to report, I’m torn. On one hand, for the task your dictionaries were trained on (reconstruction activations taken from length 128 sequences), they’re performing well and this should be reflected in the metrics. On the other hand, people should be aware that if they just plug your autoencoders into GPT2-small and start doing inference on inputs found in the wild, things will go off the rails pretty quickly. Maybe the answer is that CE diff should be reported both for sequences of the same length used in training and for arbitrary-length sequences?
What links here?
- Understanding Positional Features in Layer 0 SAEs by bilalchughtai (29 Jul 2024 9:36 UTC; 43 points)
- Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders by Evan Anders (27 Feb 2024 2:43 UTC; 42 points)
- Arthur Conmy 11 Feb 2024 0:25 UTC
  3 points
  0
  Parent
  The fact that Pythia generalizes to longer sequences but GPT-2 doesn’t isn’t very surprising to me—getting long context generalization to work is a key motivation for rotary, e.g. the original paper https://arxiv.org/abs/2104.09864
- ryan_greenblatt 10 Feb 2024 0:30 UTC
  3 points
  1
  Parent
  I think the learned positional embeddings combined with training on only short sequences is likely to be the issue. Changing either would suffice.
  - Joseph Bloom 10 Feb 2024 1:12 UTC
    1 point
    0
    Parent
    Makes sense. Will set off some runs with longer context sizes and track this in the future.