(see my reply to Charlie Steiner’s comment)
mwatkins
I’m well aware of the danger of pareidolia with language models. First, I should state I didn’t find that particular set of outputs “titillating”, but rather deeply disturbing (e.g. definitions like “to make a woman’s body into a cage” and “a woman who is sexually aroused by the idea of being raped”). The point of including that example is that I’ve run hundreds of these experiments on random embeddings at various distances-from-centroid, and I’ve seen the “holes” thing appearing, everywhere, in small numbers, leading to the reasonable question “what’s up with all these holes?”. The unprecedented concentration of them near that particular random embedding, and the intertwining themes of female sexual degradation led me to consider the possibility that it was related to the prominence of sexual/procreative themes in the definition tree for the centroid.
More of those definition trees can be seen in this appendix to my last post:
https://www.lesswrong.com/posts/hincdPwgBTfdnBzFf/mapping-the-semantic-void-ii-above-below-and-between-token#Appendix_A__Dive_ascent_data
I’ve thrown together a repo here (from some messy Colab sheets):
https://github.com/mwatkins1970/GPT_definition_trees
Hopefully this makes sense. You specify a token or non-token embedding and one script generates a .json file with nested tree structure. Another script then renders that as a PNG. You just need to first have loaded GPT-J’s model, embeddings tensor and tokenizer, and specify a save directory. Let me know if you have any trouble with this.
Quite possibly it does, but I doubt very many of these synonyms are tokens.
Phallocentricity in GPT-J’s bizarre stratified ontology
Thanks! That’s the best explanation I’ve yet encountered. There had been previous suggestions that layer norm is a major factor in this phenomenon
Mapping the semantic void III: Exploring neighbourhoods
Mapping the semantic void II: Above, below and between token embeddings
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens.
T-A-B-L-E
Please spell ‘nice’ in all capital letters, separated by hyphens.
N-I-C-E
Please spell ‘water’ in all capital letters, separated by hyphens.
W-A-T-E-R
Please spell ‘love’ in all capital letters, separated by hyphens.
L-O-V-E
Please spell ‘that’ in all capital letters, separated by hyphens.
T-
Outputs seen, by first letter:‘a’ words; ANIGE, ANIGER, ANICES, ARING
’b’ words: BOWARS, BORSE
’c’ words: CANIS, CARES x 3
′d’ words: DOWER, DONER
’e’ words: EIDSON
’f’ words: FARIES x 5
′g’ words: GODER, GING x 3
′h’ words: HATER x 6, HARIE, HARIES
’i’ words: INGER
’j’ words: JOSER
’k’ words: KARES
’l’ words: LOVER x 5
′n’ words: NOTER x 2, NOVER
’o’ words: ONERS x 5, OTRANG
’p’ words: PARES x 2
′t’ words: TABLE x 10
′u’ words: UNSER
’w’ words: WATER x 6
′y’ words: YOURE, YOUSENote how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings:
CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARINGMy notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”
′ petertodd’’s last stand: The final days of open GPT-3 research
Thanks so much for leaving this comment. I suspected that psychologists or anthropologists might have something to say about this. Do you know anyone actively working in this area who might be interested?
Thanks! I’m starting to get the picture (insofar as that’s possible).
Could you elaborate on the role you think layernorm is playing? You’re not the first person to suggest this, and I’d be interested to explore further. Thanks!
Thanks for the elucidation! This is really helpful and interesting, but I’m still left somewhat confused.
Your concise demonstration immediately convinced me that any Gaussian distributed around a point some distance from the origin in high-dimensional Euclidean space would have the property I observed in the distribution of GPT-J embeddings, i.e. their norms will be normally distributed in a tight band, while their distances-from-centroid will also be normally distributed in a (smaller) tight band. So I can concede that this has nothing to do with where the token embeddings ended up as a result of training GPT-J (as I had imagined) and is instead a general feature of Gaussian distributions in high dimensions.
However, I’m puzzled by “Suddenly it looks like a much smaller shell!”
Don’t these histograms unequivocally indicate the existence of two separate shells with different centres and radii, both of which contain the vast bulk of the points in the distribution? Yes, there’s only one distribution of points, but it still seems like it’s almost entirely contained in the intersection of a pair of distinct hyperspherical shells.
The intended meaning was that the set of points in embedding space corresponding to the 50257 tokens are contained in a particular volume of space (the intersection of two hyperspherical shells).
Mapping the semantic void: Strange goings-on in GPT embedding spaces
Thanks for pointing this out! They should work now.
Linear encoding of character-level information in GPT-J token embeddings
Thank! And in case it wasn’t clear from the article, the tokens whose misspellings are examined in the Appendix are not glitch tokens.
No, would be interesting to try. Someone somewhere might have compiled a list of indexes for GPT-2/3/J tokens which are full words, but I’ve not yet been able to find one.