Arthur Conmy comments on Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Arthur Conmy 2 Feb 2024 16:48 UTC
6 points
1
(This reply is less important than my other)
> The network itself doesn’t have a million different algorithms to perform a million different narrow subtasks
For what it’s worth, this sort of thinking is really not obvious to me at all. It seems very plausible that frontier models only have their amazing capabilities through the aggregation of a huge number of dumb heuristics (as an aside, I think if true this is net positive for alignment). This is consistent with findings that e.g. grokking and phase changes are much less common in LLMs than toy models.
(Two objections to these claims are that plausibly current frontier models are importantly limited, and also that it’s really hard to prove either me or you correct on this point since it’s all hand-wavy)