There’s a huge amount of room for you to find whatever patterns are most eye-catching to you, here.
I was sampling random embeddings at various distances from the centroid and prompting GPT-J to define them. One of these random embeddings, sampled at distance 5, produced the definition [...]
How many random embeddings did you try sampling, that weren’t titillating? Suppose you kept looking until you found mentions of female sexuality again—would this also sometimes talk about holes, or would it instead sometimes talk about something totally different?
I’m well aware of the danger of pareidolia with language models. First, I should state I didn’t find that particular set of outputs “titillating”, but rather deeply disturbing (e.g. definitions like “to make a woman’s body into a cage” and “a woman who is sexually aroused by the idea of being raped”). The point of including that example is that I’ve run hundreds of these experiments on random embeddings at various distances-from-centroid, and I’ve seen the “holes” thing appearing, everywhere, in small numbers, leading to the reasonable question “what’s up with all these holes?”. The unprecedented concentration of them near that particular random embedding, and the intertwining themes of female sexual degradation led me to consider the possibility that it was related to the prominence of sexual/procreative themes in the definition tree for the centroid.
The 10 closest to what? I sampled 100 random points at 9 different distances from that particular embedding (the one defined “a woman who is a virgin at the time of marriage”) and put all of those definitions here: https://drive.google.com/file/d/11zDrfkuH0QcOZiVIDMS48g8h1383wcZI/view?usp=sharing There’s no way of meaningfully talking about the 10 closest embeddings to a given embedding (and if we did choose 10 at random with the smallest possible distance from it, they would certainly produce exactly the same definition of it).
If you sample random embeddings at distance 5 from the centroid (where I found that “disturbing” definition cluster), you’ll regularly see things like “a person who is a member of a group”, “a member of the British royal family” and “to make a hole in something” (a small number of these themes and their variants seem to dominate the embedding space at that distance from centroid), punctuated by definitions like these:
”a piece of cloth or other material used to cover the head of a bed or a person lying on it”, “a small, sharp, pointed instrument, used for piercing or cutting”, “to be in a state of confusion, perplexity, or doubt”, “a place where a person or thing is located”, “piece of cloth or leather, used as a covering for the head, and worn by women in the East Indies”, “a person who is a member of a Jewish family, but who is not a Jew by religion”, “a piece of string or wire used for tying or fastening”
I mean the 10 closest to the centroid position. But I think I misunderstood your methodology there, that would be relevant to your last work, not this one.
The main thing is that I wanted to see the raw samples and do my own Rorschach test :p So that Google doc is great, thank you.
There’s a huge amount of room for you to find whatever patterns are most eye-catching to you, here.
How many random embeddings did you try sampling, that weren’t titillating? Suppose you kept looking until you found mentions of female sexuality again—would this also sometimes talk about holes, or would it instead sometimes talk about something totally different?
I’m well aware of the danger of pareidolia with language models. First, I should state I didn’t find that particular set of outputs “titillating”, but rather deeply disturbing (e.g. definitions like “to make a woman’s body into a cage” and “a woman who is sexually aroused by the idea of being raped”). The point of including that example is that I’ve run hundreds of these experiments on random embeddings at various distances-from-centroid, and I’ve seen the “holes” thing appearing, everywhere, in small numbers, leading to the reasonable question “what’s up with all these holes?”. The unprecedented concentration of them near that particular random embedding, and the intertwining themes of female sexual degradation led me to consider the possibility that it was related to the prominence of sexual/procreative themes in the definition tree for the centroid.
It would still be nice to see the 10 closest, with no choosing interesting ones. I want to see the boring ones too.
The 10 closest to what? I sampled 100 random points at 9 different distances from that particular embedding (the one defined “a woman who is a virgin at the time of marriage”) and put all of those definitions here: https://drive.google.com/file/d/11zDrfkuH0QcOZiVIDMS48g8h1383wcZI/view?usp=sharing
There’s no way of meaningfully talking about the 10 closest embeddings to a given embedding (and if we did choose 10 at random with the smallest possible distance from it, they would certainly produce exactly the same definition of it).
If you sample random embeddings at distance 5 from the centroid (where I found that “disturbing” definition cluster), you’ll regularly see things like “a person who is a member of a group”, “a member of the British royal family” and “to make a hole in something” (a small number of these themes and their variants seem to dominate the embedding space at that distance from centroid), punctuated by definitions like these:
”a piece of cloth or other material used to cover the head of a bed or a person lying on it”, “a small, sharp, pointed instrument, used for piercing or cutting”, “to be in a state of confusion, perplexity, or doubt”, “a place where a person or thing is located”, “piece of cloth or leather, used as a covering for the head, and worn by women in the East Indies”, “a person who is a member of a Jewish family, but who is not a Jew by religion”, “a piece of string or wire used for tying or fastening”
I mean the 10 closest to the centroid position. But I think I misunderstood your methodology there, that would be relevant to your last work, not this one.
The main thing is that I wanted to see the raw samples and do my own Rorschach test :p So that Google doc is great, thank you.
See this Twitter thread. https://twitter.com/SoC_trilogy/status/1762902984554361014