I haven’t read this properly but my guess is that this whole analysis is importantly wrong to some extent because you haven’t considered layernorm. It only makes sense to interpret embeddings in the layernorm space.
Edit: I have now read most of this and I don’t think anything you say is wrong exactly, but I do think layernorm is playing a cruitial role that you should not be ignoring.
Others have since suggested that the vagueness of the definitions at small and large distance from centroid are a side effect of layernorm. This seemed plausible at the time, but not so much now that I’ve just found this:
The prompt “A typical definition of ″ would be ’”, where there’s no customised embedding involved (we’re just eliciting a definition of the null string) gives “A person who is a member of a group.” at temp 0. And I’ve had confirmation from someone with GPT4 base model access that it does exactly the same thing (so I’d expect this is something across all GPT models—a shame GPT3 is no longer available to test this).
Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!
Any time the embeddings / residual stream vectors is used for anything, they are projected onto the surface of a n−1 dimensional hypersphere. This changes the geometry.
I haven’t read this properly but my guess is that this whole analysis is importantly wrong to some extent because you haven’t considered layernorm. It only makes sense to interpret embeddings in the layernorm space.
Edit: I have now read most of this and I don’t think anything you say is wrong exactly, but I do think layernorm is playing a cruitial role that you should not be ignoring.
But the post is still super interesting!
Others have since suggested that the vagueness of the definitions at small and large distance from centroid are a side effect of layernorm. This seemed plausible at the time, but not so much now that I’ve just found this:
The prompt “A typical definition of ″ would be ’”, where there’s no customised embedding involved (we’re just eliciting a definition of the null string) gives “A person who is a member of a group.” at temp 0. And I’ve had confirmation from someone with GPT4 base model access that it does exactly the same thing (so I’d expect this is something across all GPT models—a shame GPT3 is no longer available to test this).
Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!
Any time the embeddings / residual stream vectors is used for anything, they are projected onto the surface of a n−1 dimensional hypersphere. This changes the geometry.