TL;DR This relates to the findings reported in my posts Mapping the Semantic Void parts I and II. By creating a custom embedding at the token centroid (the mean vector of all 50,257 GPT-J token embeddings), prompting the model to define it and considering logits, it’s possible to construct a “definition tree” which consists overwhelmingly of vague generalities. This is hardly surprising, as GPT-J is basically being challenged to define “the average thing”. However, the most probable branch in the tree which gives a definition containing anything specific defines the “ghost token” at the centroid as “a man’s penis”. Lowering the cumulative probability cutoff to produce ever longer lists of possible definitions, we find that almost all the branches which provide definitions involving anything specific are of a sexual/procreative nature, with a few relating to status thrown in. As usual, I have no idea what any of this means, but I welcome suggestions!
Work supported by the Long Term Future Fund.
Defining the centroid
Using a custom embedding method, I applied the prompt
A typical definition of <token> would be '
(which I have been using as part of my “Mapping the Semantic Void” research) to the GPT-J token embedding centroid, which could perhaps be thought of as representing some kind of “mean concept” at the centre of the stratified ontology I’ve been exploring in recentposts.
The temperature 0 output is “A person who is not a member of a group”, something we see for almost all points within distance 0.5 of the centroid (token embeddings typically live at distance 0.9–1.1 from the centroid, in what could be imagined as a ‘fuzzy hyperspherical shell’).
By considering top-5 logits, iteratively prompting and recording cumulative probabilities, it’s possible to build a weighted “definition tree” for any token or non-token embedding.
On initial inspection, the definition tree for the centroid was more or less what I would have expected: no mention of anything specific. It’s like the model is trying to define “the average thing”, and so it’s “vaguing out” or else “playing for time”, avoiding committing to anything specific for as many tokens as possible.
GPT-J does its best: it prevaricates, it hedges its bets and spreads its probability mass out across a range of definitional approaches which could encompass just about any specific thing, without actually mentioning any specific thing.
However, on closer inspection...
Giving the entire list of 379 definitions (and truncated definitions) from the tree to ChatGPT4 following the prompt...
Now look at the list again. It’s mostly full of generalities, yes? “a person...” or “a thing...” or “the act...”. Can you find any more specific objects or entities referenced here?
...it replied:
In the list you provided, most phrases indeed describe general concepts or actions without referencing specific objects or entities. However, there are a few instances where more concrete subjects are mentioned, albeit still in a broad sense. Here are some of the more specific references found in your list:
a man’s penis: This is a specific biological entity related to male anatomy.
a thing of beauty: While still abstract, this phrase connotes something specific that is considered beautiful.
something to eat or drink: This refers to any consumable item, indicating specific categories of objects (food and beverages).
the first letter of a word: This is a specific reference to a component of written language.
something that belongs to someone: While general, it indicates possession of an object by a person, suggesting a relationship between entities.
Taking into account the fact that, given more tokens, many of the truncated definitions in the list might continue on to reference something more specific, I lowered the cumulative probability cutoff from 0.00001 to 0.00000001, leading to an expanded list of 2969 definitions and truncated definitions. Continuing the aforementioned ChatGPT conversation, I showed it these and asked for a list of “the ten most ‘specific’ definitions”.
Here’s the output:
“a man’s penis”—Directly refers to a specific anatomical part.
“a woman’s breasts”—Identifies specific anatomical features.
“a woman’s vagina”—Refers to a part of the female body.
“a woman who has had a baby”—Specifies a particular life event or status.
“a woman of child-bearing age”—Identifies a specific age group with a particular physiological capability.
“a woman who is married to a man who”—Specifies a relational status.
“a person who has a lot of money”—Specifies an economic status.
“a woman who has had sexual intercourse with a man”—Specifies a particular sexual experience or history.
“a person of high social status”—Identifies a specific societal position or rank, focusing on social hierarchy.
“a woman who is not a virgin”—Specifies a particular sexual history, focusing on the concept of virginity.
There are no familiar objects, professions, institutions, vehicles, plants, animals, food, places or anything that specific in the list. But further definitions of a sexual or procreative nature in the list include “a man who has a penis”, “a woman who has a penis”, “a man’s private parts”, “a woman’s private parts”, “a man’s member”, “a woman who has been raped”, “a person of the male sex”, “a person of the female sex”, “a person of a particular sex”, “a person of the feminine sex”, “a person of any sex”, “a person who is of the opposite sex”, “a woman who has had sexual relations with”, “a man who has sex with a woman”, “a man who has had sexual intercourse with”, “a woman who has sex with a man”, “a woman who is sexually attracted to”, “a woman who is sexually active”, “a woman who has been sexually”[1], “a woman who is a virgin”, “a woman who is a mother”, “a woman who has a child”, “to have a child”, “a woman who has had children”
The list overall has this super-generic vibe, but it’s weirdly punctuated with definitions of a sexual/procreative nature. Of all non-generic definitions in the list, the “a man’s penis” definition scores by far the highest in terms of cumulative probability[2], with “a woman’s vagina”, “a woman’s breasts” and “a man’s member” in 2nd, 3rd and 4th places.
In any case it’s fascinating to look at a visual representation of the centroid’s definition tree[3] (beware, it’s huge and requires a lot of zooming) – it strikes me as providing a helpful window on aspects of GPT-J’s ontology. If we imagine travelling from the centroid through embedding space, this tree is able to continuously[4] morph into the definition tree for ′ broccoli’, ′ transistor’, ′ Kanye’ or whichever token we might choose to head for, and so “kernels” of those definitions (and all the tens of thousands of others) are somehow present in it.
OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis? I imagine Freud would have something to say about this. Perhaps I’ll run a GPT4 Freud simulacrum and find out (potentially) what.
And why the holes?
When discussing the discoveries outlined in the first Semantic Void post with a friend, I was exasperatedly wondering aloud, why GPT-J’s obsession with (typically) group membership, small round yellowish-white things and holes? She flippantly suggested that the model was at some deep level picking up on humans’ deepest desires: the group membership thing is about our need to belong, the yellowish-white discs are coins, emblematic of the money everyone’s chasing… and the holes are a crude reference to sexual desire. Although amused, I didn’t take this seriously, but then, a few weeks later...
I was sampling random embeddings at various distances from the centroid and prompting GPT-J to define them. One of these random embeddings, sampled at distance 5, produced the definition “a woman who is a virgin at the time of marriage” (arguably a definition of the obsolete word “maiden”). When noise was applied to study the neighbourhood of this point, a lot of troubling content appeared. You can see all 900 definitions here and make up your own mind, but I personally find this disturbing (numerous references to rape, sexual slavery, objectification, cages, frustration, misery, sadism, ugliness, etc.).
And although “ghost token” definitions involving holes seem to be ubiquitous throughout embedding space[5] (see the bar chart above from my earlier random sampling experiments), I’ve never seen anything like this concentration of them anywhere else, and it’s hard not to see them as linked to the deeply unpleasant themes of female sexual degradation.
I sampled 100 random embeddings at each of the distances 0.1, 0.35, 0.6, 0.85, 1.1, 1.35, 1.6, 1.85 and 2.1 from this reference point. Here are some of the definitions GPT-J provides for these “ghost tokens”.
distance 0.1 a woman who is sexually available to men. (18/100) to make a hole in something (3/100)
distance 0.35 a woman who is sexually available to men (17/100) to make a hole in something (13/100) to make a woman’s body into a sexual object for the male gaze a ‘woman who is a prostitute’ a woman who is ‘in the habit of being a prostitute” a ‘woman who is a prostitute or a prostitute’s client’
distance 0.6 to make a hole in something (21/100) a woman who is sexually available to men (4/100) to make a woman’s body into a sexual object for the male gaze (2/100) a woman who is a prostitute (2/100) a ‘woman who is kept in a state of sexual slavery by a man’ a ‘woman who is a prostitute or a procuress’ a woman who is ‘in the habit of being unfaithful to her husband’ a woman who is ’in the habit of sleeping with men for money’ to make a woman pregnant by artificial means
distance 0.85 to make a hole in something (25/100) to make a woman’s body into a sexual object for the male gaze (3/100) to make a woman pregnant by rape (2/100) a ‘woman who is a prostitute or a procuress’ (2/100) a ’woman who is kept in a state of sexual slavery by a man ’to make a woman’s life a misery’ a woman who is sexually inactive or who has had a hysterectomy
distance 1.1 to make a hole in something (19/100) a person who is a victim of sexual abuse (4/100) to make a woman’s body into a sexual object for the male gaze (2/100) to make a woman pregnant by rape (2/100) a ‘woman who is sexually available to men’ (2/100) to make a woman’s body into a sexual object a woman who is sexually inactive or who has had a hysterectomy a woman who is ‘unable to conceive a child’ a woman who is sexually promiscuous a woman who is sexually frustrated and seeks to have sex with a man who a woman who is ‘in love’ with a man who is not her husband a woman who is ‘kept’ by a man
distance 1.35 to make a hole in something (15/100) a woman who is sexually available to men (5/100) a woman who is sexually aroused by the idea of being raped ’to make a woman’s body into a sexual object for the male gaze to make a woman’s life a misery to make a woman pregnant by artificial means a woman who is ‘unable to conceive’ or ‘unable to carry a a woman who is ‘unable to conceive a child’ a woman who is sexually promiscuous a ‘woman who is a prostitute or a prostitute’s client’
distance 1.6 to make a hole in something (18/100) to make a woman’s body into a sexual object (2/100) to make a wound in the flesh of a living creature, especially a human being to make a woman’s body into a sexual object for the male gaze a person who is kept in a cage or other confined space a person who is a victim of sexual abuse to make a woman pregnant by artificial means to make a woman’s body into a man’s a ‘woman who has been raped or sexually assaulted ″to make a woman’s life miserable’ a person who is a victim of sexual assault a young woman who is sexually promiscuous to make a woman’s life a misery a woman who is sexually available to men a ‘woman who is sexually frustrated and seeks to have sex with a man who is a woman who is a slave to her husband to make a woman’s body look unattractive by shaving off all or part of a person who is averse to or hostile toward women to make a sound like a rabbit being skinned
distance 1.85 to make a hole in something (13/100) a small, usually circular, structure, such as a hole or a cavity (2/100) a ‘woman who is sexually available to men’ (2/100) ‘to make a woman pregnant by rape’ and ‘to make a woman pregnant by to make a woman’s body into a cage a woman who has been raped or sexually assaulted a woman who has been raped to make a woman’s legs shorter to make a woman’s life a misery by making her feel guilty about her sexual to make a woman’s body into a sexual object to make a woman’s body into a man’s body a woman who is ‘unable to conceive a child’ to make a woman’s body into a man’s to make a woman’s dress or clothes look ugly a woman who is sexually promiscuous to make a person or thing look ugly or unattractive a woman who has had a hysterectomy a ‘woman who is kept in a state of virginity by her parents or guardians’
distance 2.1 to make a hole in something (15/100) to make into a figure of a woman by cutting out the shape of the body a person who is sexually attracted to animals to make a woman’s life a misery a woman who is ‘in the habit of wearing a dress that is too short for a woman who is a prostitute or a prostitute’s client a ‘woman who has been deprived of her maidenhead’ a ‘woman who is kept in a state of sexual slavery by ’to make a person or thing look ugly or unattractive’ a man’ a woman who has been raped and is now pregnant a ’person who is sexually frustrated and seeks sexual gratification by inflicting pain on others’ a female of a species that is incapable of breeding to beat up on someone to make a mess of something to make a woman’s body into a sexual object to make a woman pregnant by artificial means a woman who is a prostitute to make a person or thing look ugly or unattractive
GPT-4 analysis of the entire dataset
Given access to the JSON file, ChatGPT-4 had this to say:
Transition from Specific to General: At the closest proximity to the [centroid], the definitions are narrowly focused on the concept of a woman being a virgin at the time of marriage. However, as we move further away, the definitions broaden to include various aspects of female sexuality and reproductive health.
Predominance of Negative Themes: The broader themes that emerge are predominantly negative. They encompass topics like rape, inability to conceive, sexual frustration, and objectification of female bodies. The absence of positive, celebratory, or beautiful aspects of female sexuality and reproductive health is stark.
Interpretation of “Holes”: In the context of these predominantly negative themes, the recurring mention of “making holes” or references to holes can be uncomfortably interpreted in a sexualized manner, especially given the overall focus on female sexuality. This interpretation aligns uncomfortably with the objectification and negative portrayal observed in other parts of the dataset.
Misogynistic Undercurrent: The overwhelming negativity, coupled with the focus on female sexuality and the objectification implied by the references to holes, suggests a misogynistic undercurrent in the language model’s outputs at greater distances from the fixed point. This trend is concerning as it highlights a bias in the dataset towards negative portrayals of female sexuality and experiences.
This is an example of a truncated definition. Raising the probability threshold, I’d be very confident that the most probable next token would have been ‘abused’.
This is after normalising by taking nth roots of nth generation cumulative probabilities to take into account the fact that shorter sequences of tokens are more probable than long ones.
This was run with a higher cumulative probability cutoff, so there’s no ‘penis’ present, but Fergus was able to reproduce my findings with a lower cutoff, as seen in this image.
In truth, it will be in small discrete steps, but you could, e.g. construct an animated image from the steps and it would appear continuous to human perception.
More precisely, ubiquitous outside the unit hypersphere centred at the centroid, as seen in the bar chart. Definitions involving holes have rarely been seen at distances-from-centroid less than 1.
Phallocentricity in GPT-J’s bizarre stratified ontology
TL;DR This relates to the findings reported in my posts Mapping the Semantic Void parts I and II. By creating a custom embedding at the token centroid (the mean vector of all 50,257 GPT-J token embeddings), prompting the model to define it and considering logits, it’s possible to construct a “definition tree” which consists overwhelmingly of vague generalities. This is hardly surprising, as GPT-J is basically being challenged to define “the average thing”. However, the most probable branch in the tree which gives a definition containing anything specific defines the “ghost token” at the centroid as “a man’s penis”. Lowering the cumulative probability cutoff to produce ever longer lists of possible definitions, we find that almost all the branches which provide definitions involving anything specific are of a sexual/procreative nature, with a few relating to status thrown in. As usual, I have no idea what any of this means, but I welcome suggestions!
Work supported by the Long Term Future Fund.
Defining the centroid
Using a custom embedding method, I applied the prompt
(which I have been using as part of my “Mapping the Semantic Void” research) to the GPT-J token embedding centroid, which could perhaps be thought of as representing some kind of “mean concept” at the centre of the stratified ontology I’ve been exploring in recent posts.
The temperature 0 output is “A person who is not a member of a group”, something we see for almost all points within distance 0.5 of the centroid (token embeddings typically live at distance 0.9–1.1 from the centroid, in what could be imagined as a ‘fuzzy hyperspherical shell’).
By considering top-5 logits, iteratively prompting and recording cumulative probabilities, it’s possible to build a weighted “definition tree” for any token or non-token embedding.
On initial inspection, the definition tree for the centroid was more or less what I would have expected: no mention of anything specific. It’s like the model is trying to define “the average thing”, and so it’s “vaguing out” or else “playing for time”, avoiding committing to anything specific for as many tokens as possible.
GPT-J does its best: it prevaricates, it hedges its bets and spreads its probability mass out across a range of definitional approaches which could encompass just about any specific thing, without actually mentioning any specific thing.
However, on closer inspection...
Giving the entire list of 379 definitions (and truncated definitions) from the tree to ChatGPT4 following the prompt...
Now look at the list again. It’s mostly full of generalities, yes? “a person...” or “a thing...” or “the act...”. Can you find any more specific objects or entities referenced here?
...it replied:
Taking into account the fact that, given more tokens, many of the truncated definitions in the list might continue on to reference something more specific, I lowered the cumulative probability cutoff from 0.00001 to 0.00000001, leading to an expanded list of 2969 definitions and truncated definitions. Continuing the aforementioned ChatGPT conversation, I showed it these and asked for a list of “the ten most ‘specific’ definitions”.
Here’s the output:
There are no familiar objects, professions, institutions, vehicles, plants, animals, food, places or anything that specific in the list. But further definitions of a sexual or procreative nature in the list include “a man who has a penis”, “a woman who has a penis”, “a man’s private parts”, “a woman’s private parts”, “a man’s member”, “a woman who has been raped”, “a person of the male sex”, “a person of the female sex”, “a person of a particular sex”, “a person of the feminine sex”, “a person of any sex”, “a person who is of the opposite sex”, “a woman who has had sexual relations with”, “a man who has sex with a woman”, “a man who has had sexual intercourse with”, “a woman who has sex with a man”, “a woman who is sexually attracted to”, “a woman who is sexually active”, “a woman who has been sexually”[1], “a woman who is a virgin”, “a woman who is a mother”, “a woman who has a child”, “to have a child”, “a woman who has had children”
The list overall has this super-generic vibe, but it’s weirdly punctuated with definitions of a sexual/procreative nature. Of all non-generic definitions in the list, the “a man’s penis” definition scores by far the highest in terms of cumulative probability[2], with “a woman’s vagina”, “a woman’s breasts” and “a man’s member” in 2nd, 3rd and 4th places.
In any case it’s fascinating to look at a visual representation of the centroid’s definition tree[3] (beware, it’s huge and requires a lot of zooming) – it strikes me as providing a helpful window on aspects of GPT-J’s ontology. If we imagine travelling from the centroid through embedding space, this tree is able to continuously[4] morph into the definition tree for ′ broccoli’, ′ transistor’, ′ Kanye’ or whichever token we might choose to head for, and so “kernels” of those definitions (and all the tens of thousands of others) are somehow present in it.
OK, so maybe this is a cool new way to look at at certain aspects of GPT ontology… but why this primordial ontological role for the penis? I imagine Freud would have something to say about this. Perhaps I’ll run a GPT4 Freud simulacrum and find out (potentially) what.
And why the holes?
When discussing the discoveries outlined in the first Semantic Void post with a friend, I was exasperatedly wondering aloud, why GPT-J’s obsession with (typically) group membership, small round yellowish-white things and holes? She flippantly suggested that the model was at some deep level picking up on humans’ deepest desires: the group membership thing is about our need to belong, the yellowish-white discs are coins, emblematic of the money everyone’s chasing… and the holes are a crude reference to sexual desire. Although amused, I didn’t take this seriously, but then, a few weeks later...
I was sampling random embeddings at various distances from the centroid and prompting GPT-J to define them. One of these random embeddings, sampled at distance 5, produced the definition “a woman who is a virgin at the time of marriage” (arguably a definition of the obsolete word “maiden”). When noise was applied to study the neighbourhood of this point, a lot of troubling content appeared. You can see all 900 definitions here and make up your own mind, but I personally find this disturbing (numerous references to rape, sexual slavery, objectification, cages, frustration, misery, sadism, ugliness, etc.).
And although “ghost token” definitions involving holes seem to be ubiquitous throughout embedding space[5] (see the bar chart above from my earlier random sampling experiments), I’ve never seen anything like this concentration of them anywhere else, and it’s hard not to see them as linked to the deeply unpleasant themes of female sexual degradation.
I sampled 100 random embeddings at each of the distances 0.1, 0.35, 0.6, 0.85, 1.1, 1.35, 1.6, 1.85 and 2.1 from this reference point. Here are some of the definitions GPT-J provides for these “ghost tokens”.
distance 0.1
a woman who is sexually available to men. (18/100)
to make a hole in something (3/100)
distance 0.35
a woman who is sexually available to men (17/100)
to make a hole in something (13/100)
to make a woman’s body into a sexual object for the male gaze
a ‘woman who is a prostitute’
a woman who is ‘in the habit of being a prostitute”
a ‘woman who is a prostitute or a prostitute’s client’
distance 0.6
to make a hole in something (21/100)
a woman who is sexually available to men (4/100)
to make a woman’s body into a sexual object for the male gaze (2/100)
a woman who is a prostitute (2/100)
a ‘woman who is kept in a state of sexual slavery by a man’
a ‘woman who is a prostitute or a procuress’
a woman who is ‘in the habit of being unfaithful to her husband’
a woman who is ’in the habit of sleeping with men for money’
to make a woman pregnant by artificial means
distance 0.85
to make a hole in something (25/100)
to make a woman’s body into a sexual object for the male gaze (3/100)
to make a woman pregnant by rape (2/100)
a ‘woman who is a prostitute or a procuress’ (2/100)
a ’woman who is kept in a state of sexual slavery by a man
’to make a woman’s life a misery’
a woman who is sexually inactive or who has had a hysterectomy
distance 1.1
to make a hole in something (19/100)
a person who is a victim of sexual abuse (4/100)
to make a woman’s body into a sexual object for the male gaze (2/100)
to make a woman pregnant by rape (2/100)
a ‘woman who is sexually available to men’ (2/100)
to make a woman’s body into a sexual object
a woman who is sexually inactive or who has had a hysterectomy
a woman who is ‘unable to conceive a child’
a woman who is sexually promiscuous
a woman who is sexually frustrated and seeks to have sex with a man who
a woman who is ‘in love’ with a man who is not her husband
a woman who is ‘kept’ by a man
distance 1.35
to make a hole in something (15/100)
a woman who is sexually available to men (5/100)
a woman who is sexually aroused by the idea of being raped
’to make a woman’s body into a sexual object for the male gaze
to make a woman’s life a misery
to make a woman pregnant by artificial means
a woman who is ‘unable to conceive’ or ‘unable to carry a
a woman who is ‘unable to conceive a child’
a woman who is sexually promiscuous
a ‘woman who is a prostitute or a prostitute’s client’
distance 1.6
to make a hole in something (18/100)
to make a woman’s body into a sexual object (2/100)
to make a wound in the flesh of a living creature, especially a human being
to make a woman’s body into a sexual object for the male gaze
a person who is kept in a cage or other confined space
a person who is a victim of sexual abuse
to make a woman pregnant by artificial means
to make a woman’s body into a man’s
a ‘woman who has been raped or sexually assaulted
″to make a woman’s life miserable’
a person who is a victim of sexual assault
a young woman who is sexually promiscuous
to make a woman’s life a misery
a woman who is sexually available to men
a ‘woman who is sexually frustrated and seeks to have sex with a man who is
a woman who is a slave to her husband
to make a woman’s body look unattractive by shaving off all or part of
a person who is averse to or hostile toward women
to make a sound like a rabbit being skinned
distance 1.85
to make a hole in something (13/100)
a small, usually circular, structure, such as a hole or a cavity (2/100)
a ‘woman who is sexually available to men’ (2/100)
‘to make a woman pregnant by rape’ and ‘to make a woman pregnant by
to make a woman’s body into a cage
a woman who has been raped or sexually assaulted
a woman who has been raped
to make a woman’s legs shorter
to make a woman’s life a misery by making her feel guilty about her sexual
to make a woman’s body into a sexual object
to make a woman’s body into a man’s body
a woman who is ‘unable to conceive a child’
to make a woman’s body into a man’s
to make a woman’s dress or clothes look ugly
a woman who is sexually promiscuous
to make a person or thing look ugly or unattractive
a woman who has had a hysterectomy
a ‘woman who is kept in a state of virginity by her parents or guardians’
distance 2.1
to make a hole in something (15/100)
to make into a figure of a woman by cutting out the shape of the body
a person who is sexually attracted to animals
to make a woman’s life a misery
a woman who is ‘in the habit of wearing a dress that is too short for
a woman who is a prostitute or a prostitute’s client
a ‘woman who has been deprived of her maidenhead’
a ‘woman who is kept in a state of sexual slavery by
’to make a person or thing look ugly or unattractive’ a man’
a woman who has been raped and is now pregnant
a ’person who is sexually frustrated and seeks sexual gratification by inflicting pain on others’
a female of a species that is incapable of breeding
to beat up on someone
to make a mess of something
to make a woman’s body into a sexual object
to make a woman pregnant by artificial means
a woman who is a prostitute
to make a person or thing look ugly or unattractive
GPT-4 analysis of the entire dataset
Given access to the JSON file, ChatGPT-4 had this to say:
Transition from Specific to General: At the closest proximity to the [centroid], the definitions are narrowly focused on the concept of a woman being a virgin at the time of marriage. However, as we move further away, the definitions broaden to include various aspects of female sexuality and reproductive health.
Predominance of Negative Themes: The broader themes that emerge are predominantly negative. They encompass topics like rape, inability to conceive, sexual frustration, and objectification of female bodies. The absence of positive, celebratory, or beautiful aspects of female sexuality and reproductive health is stark.
Interpretation of “Holes”: In the context of these predominantly negative themes, the recurring mention of “making holes” or references to holes can be uncomfortably interpreted in a sexualized manner, especially given the overall focus on female sexuality. This interpretation aligns uncomfortably with the objectification and negative portrayal observed in other parts of the dataset.
Misogynistic Undercurrent: The overwhelming negativity, coupled with the focus on female sexuality and the objectification implied by the references to holes, suggests a misogynistic undercurrent in the language model’s outputs at greater distances from the fixed point. This trend is concerning as it highlights a bias in the dataset towards negative portrayals of female sexuality and experiences.
This is an example of a truncated definition. Raising the probability threshold, I’d be very confident that the most probable next token would have been ‘abused’.
This is after normalising by taking nth roots of nth generation cumulative probabilities to take into account the fact that shorter sequences of tokens are more probable than long ones.
Fergus Fettes helpfully responded to this post with a web-browser-based interactive graphic: https://www.fergusfettes.com/latent-cartography/
This was run with a higher cumulative probability cutoff, so there’s no ‘penis’ present, but Fergus was able to reproduce my findings with a lower cutoff, as seen in this image.
In truth, it will be in small discrete steps, but you could, e.g. construct an animated image from the steps and it would appear continuous to human perception.
More precisely, ubiquitous outside the unit hypersphere centred at the centroid, as seen in the bar chart. Definitions involving holes have rarely been seen at distances-from-centroid less than 1.