OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis? I imagine Freud would have something to say about this. Perhaps I’ll run a GPT4 Freud simulacrum and find out (potentially) what.
My guess is that humans tend to use a lot of vague euphemisms when talking about sex and genitalia.
In a lot of contexts, “Are they doing it?” would refer to sex, because humans often prefer to keep some level of plausible deniability.
Which leaves some belief that vagueness implies sexual content.
Adding to this, basically every word can be used as a sexual euphemism, especially for genitalia. So it doesn’t surprise me at all that the centroid, which is sort of the average of all words, can be defined as such.
It would be still astonishing that GPT-J would pick up this pattern.
Why would all of these euphemisms cancel out at the centroid and not any other of a thousand other things that use euphemisms and metaphors? Any boolean sarcasm or irony would do.
It might be astonishing, but this is fundamentally how word embedding works, by modelling the co-distribution of words/ expressions. You know the “nudge, nudge, you know what I mean” Python sketch? Try appending “if you know what I mean” to the end of random sentences.
There is more than one possibility when you append “if you know what I mean” to the end of a random sentence:
Sexual innuendos.
Illicit activities or behaviors.
Inside jokes or references understood only by a specific group.
Subtle insults or mocking.
Sure, the first is the strongest, but the others would move the centroid away from “phallus”. The centroid is not at the most likely item but at the average.
My guess is that humans tend to use a lot of vague euphemisms when talking about sex and genitalia.
In a lot of contexts, “Are they doing it?” would refer to sex, because humans often prefer to keep some level of plausible deniability.
Which leaves some belief that vagueness implies sexual content.
Adding to this, basically every word can be used as a sexual euphemism, especially for genitalia. So it doesn’t surprise me at all that the centroid, which is sort of the average of all words, can be defined as such.
It would be still astonishing that GPT-J would pick up this pattern.
Why would all of these euphemisms cancel out at the centroid and not any other of a thousand other things that use euphemisms and metaphors? Any boolean sarcasm or irony would do.
It might be astonishing, but this is fundamentally how word embedding works, by modelling the co-distribution of words/ expressions. You know the “nudge, nudge, you know what I mean” Python sketch? Try appending “if you know what I mean” to the end of random sentences.
There is more than one possibility when you append “if you know what I mean” to the end of a random sentence:
Sexual innuendos.
Illicit activities or behaviors.
Inside jokes or references understood only by a specific group.
Subtle insults or mocking.
Sure, the first is the strongest, but the others would move the centroid away from “phallus”. The centroid is not at the most likely item but at the average.
I’d guess that it’s related specifically to “thing” being a euphemism for penis, as opposed to some broader generalization about euphemisms.
“thing” wasn’t part of the prompt.