Arthur Conmy comments on Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Arthur Conmy 11 Feb 2024 0:25 UTC
3 points
0
The fact that Pythia generalizes to longer sequences but GPT-2 doesn’t isn’t very surprising to me—getting long context generalization to work is a key motivation for rotary, e.g. the original paper https://arxiv.org/abs/2104.09864