Joseph Miller comments on Mapping the semantic void: Strange goings-on in GPT embedding spaces

Joseph Miller 16 Dec 2023 13:06 UTC
9 points
6
I haven’t read this properly but my guess is that this whole analysis is importantly wrong to some extent because you haven’t considered layernorm. It only makes sense to interpret embeddings in the layernorm space.

Edit: I have now read most of this and I don’t think anything you say is wrong exactly, but I do think layernorm is playing a cruitial role that you should not be ignoring.

But the post is still super interesting!
- mwatkins 27 Feb 2024 8:15 UTC
  2 points
  0
  Parent
  Others have since suggested that the vagueness of the definitions at small and large distance from centroid are a side effect of layernorm. This seemed plausible at the time, but not so much now that I’ve just found this:
  
  The prompt “A typical definition of ″ would be ’”, where there’s no customised embedding involved (we’re just eliciting a definition of the null string) gives “A person who is a member of a group.” at temp 0. And I’ve had confirmation from someone with GPT4 base model access that it does exactly the same thing (so I’d expect this is something across all GPT models—a shame GPT3 is no longer available to test this).
- mwatkins 19 Dec 2023 15:44 UTC
  2 points
  0
  Parent
  Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!
  - Joseph Miller 20 Dec 2023 19:12 UTC
    1 point
    0
    Parent
    Any time the embeddings / residual stream vectors is used for anything, they are projected onto the surface of a $n - 1$ dimensional hypersphere. This changes the geometry.