Joseph Bloom comments on [Research Update] Sparse Autoencoder features are bimodal

Joseph Bloom 24 Jun 2023 2:12 UTC
4 points
0
Hey Robert, great work! My focus isn’t currently on this but I thought I’d mention that these trends might relate to some of the observations in the Finding Neurons in a Haystack paper. https://arxiv.org/abs/2305.01610.
If you haven’t read the paper, the short version is they used sparse probing to find neurons which linearly encode variables like “is this python code” in a variety of models with varying size.
The specific observation which I believe may be relevant:
“As models increase in size, representation sparsity increases on average, but different features obey different dynamics: some features with dedicated neurons emerge with scale, others split into finer grained features with scale, and many remain unchanged or appear somewhat randomly”
I believe this accords with your observation that “Finding more features finds more high-MCS features, but finds even more low-MCS features”.
Maybe finding ways to directly compare approaches could support further use of either approach.
Also, interesting to hear about using EMD over KL divergence. I hadn’t thought about that!