Adam Jermyn comments on Polysemanticity and Capacity in Neural Networks

Adam Jermyn 19 Oct 2022 1:25 UTC
LW: 3 AF: 2
3
AF
Oh I see! Sorry I didn’t realize you were describing a process for picking features.
I think this is a good idea to try, though I do have a concern. My worry is that if you do this on a model where you know what the features actually are, what happens is that this procedure discovers some heavily polysemantic “feature” that makes better use of capacity than any of the actual features in the problem. Because dL/dC_i is not a linear function of the feature’s embedding vector, there can exist superpositions of features which have greater dL/dC_i than any feature.
Anyway, I think this is a good thing to try and encourage someone to do so! I’m happy to offer guidance/feedback/chat with people interested in pursuing this, as automated feature identification seems like a really useful thing to have even if it turns out to be really expensive.