Hi, nice work! You mentioned the possibility of neurons being the wrong unit. I think that this is the case and that our current best guess for the right unit is directions in the output space, ie linear combinations of neurons.
We’ve done some work using dictionary learning to find these directions (see original post, recent results) and find that with sparse coding we can find dictionaries of features that are more interpretable the neuron basis (though they don’t explain 100% of the variance).
We’d be really interested to see how this compares to neurons in a test like this and could get a sparse-coded breakdown of gpt2-small layer 6 if you’re interested.
Thank you Hoagy. Expanding beyond the neuron unit is a high priority. I’d like to work with you, Logan Riggs, and others to figure out a good way to make this happen in the next major update so that people can easily view, test, and contribute. I’m now creating a new channel on the discord (#directions) to discuss this: https://discord.gg/kpEJWgvdAx, or I’ll DM you my email if you prefer that.
Hi, nice work! You mentioned the possibility of neurons being the wrong unit. I think that this is the case and that our current best guess for the right unit is directions in the output space, ie linear combinations of neurons.
We’ve done some work using dictionary learning to find these directions (see original post, recent results) and find that with sparse coding we can find dictionaries of features that are more interpretable the neuron basis (though they don’t explain 100% of the variance).
We’d be really interested to see how this compares to neurons in a test like this and could get a sparse-coded breakdown of gpt2-small layer 6 if you’re interested.
Thank you Hoagy. Expanding beyond the neuron unit is a high priority. I’d like to work with you, Logan Riggs, and others to figure out a good way to make this happen in the next major update so that people can easily view, test, and contribute. I’m now creating a new channel on the discord (#directions) to discuss this: https://discord.gg/kpEJWgvdAx, or I’ll DM you my email if you prefer that.