mwatkins comments on Phallocentricity in GPT-J’s bizarre stratified ontology

mwatkins 17 Feb 2024 17:31 UTC
8 points
3
I’m well aware of the danger of pareidolia with language models. First, I should state I didn’t find that particular set of outputs “titillating”, but rather deeply disturbing (e.g. definitions like “to make a woman’s body into a cage” and “a woman who is sexually aroused by the idea of being raped”). The point of including that example is that I’ve run hundreds of these experiments on random embeddings at various distances-from-centroid, and I’ve seen the “holes” thing appearing, everywhere, in small numbers, leading to the reasonable question “what’s up with all these holes?”. The unprecedented concentration of them near that particular random embedding, and the intertwining themes of female sexual degradation led me to consider the possibility that it was related to the prominence of sexual/procreative themes in the definition tree for the centroid.
- wassname 24 Feb 2024 13:54 UTC
  3 points
  2
  Parent
  It would still be nice to see the 10 closest, with no choosing interesting ones. I want to see the boring ones too.
  - mwatkins 24 Feb 2024 16:23 UTC
    3 points
    1
    Parent
    The 10 closest to what? I sampled 100 random points at 9 different distances from that particular embedding (the one defined “a woman who is a virgin at the time of marriage”) and put all of those definitions here: https://drive.google.com/file/d/11zDrfkuH0QcOZiVIDMS48g8h1383wcZI/view?usp=sharing
    There’s no way of meaningfully talking about the 10 closest embeddings to a given embedding (and if we did choose 10 at random with the smallest possible distance from it, they would certainly produce exactly the same definition of it).
    - mwatkins 24 Feb 2024 16:27 UTC
      3 points
      0
      Parent
      If you sample random embeddings at distance 5 from the centroid (where I found that “disturbing” definition cluster), you’ll regularly see things like “a person who is a member of a group”, “a member of the British royal family” and “to make a hole in something” (a small number of these themes and their variants seem to dominate the embedding space at that distance from centroid), punctuated by definitions like these:
      
      ”a piece of cloth or other material used to cover the head of a bed or a person lying on it”, “a small, sharp, pointed instrument, used for piercing or cutting”, “to be in a state of confusion, perplexity, or doubt”, “a place where a person or thing is located”, “piece of cloth or leather, used as a covering for the head, and worn by women in the East Indies”, “a person who is a member of a Jewish family, but who is not a Jew by religion”, “a piece of string or wire used for tying or fastening”
    - wassname 25 Feb 2024 1:29 UTC
      1 point
      0
      Parent
      I mean the 10 closest to the centroid position. But I think I misunderstood your methodology there, that would be relevant to your last work, not this one.
      
      The main thing is that I wanted to see the raw samples and do my own Rorschach test :p So that Google doc is great, thank you.