Clement Neo comments on We Found An Neuron in GPT-2

Clement Neo 12 Feb 2023 10:57 UTC
6 points
1
We took dot product over cosine similarity because the dot product is the neuron’s effect on the logits (since we use the dot product of the residual stream and embedding matrix when unembedding).

I think your point on using the scale $W_{i n}$ if we are concerned about the scale of $W_{o u t}$ is fair — we didn’t really look at how the rest of the network interacted with this neuron through its input weights, but perhaps a input-scaled congruence score (e.g. output congruence * average of squared input weights) could give us a better representation of a neuron’s relevance for a token.