Joseph Bloom comments on Matryoshka Sparse Autoencoders

Joseph Bloom 14 Dec 2024 7:10 UTC
5 points
0
Cool work! I’d be excited to see whether latents found via this method are higher quality linear classifiers when they appear to track concepts (eg: first letters) and also if they enable us to train better classifiers over model internals than other SAE architectures or linear probes (https://transformer-circuits.pub/2024/features-as-classifiers/index.html).