I see the two main arguments of the book as 1) we should understand “gender identity” as a bunch of subjective feelings about various traits, which may or may not cohere into an introspectively accessible “identity”; 2) we can understand gender categories as a particular kind of irreducible category (namely historical lineages) to which membership is granted by community consensus, the categories being “irreducible” in that they are not defined by additional facts about their members. These stand or fall independently of whether we accept gender self-id, although self-id is compatible with BG’s understanding of categories in a way that it is not necessarily with clusters.
See the last section of the review for reasons why we might sometimes prefer BG’s analysis of categories on the outside view; I think it’s potentially more useful for thinking about the role of categories in society and in people’s lives. I agree this is not a knockdown case, but I certainly think it’s a better framework than e.g. “men are those with the essential spirit of man-ness inside them,” which is also coherent but not very interesting.
Model at https://docs.google.com/document/d/1rGuMXD6Lg2EcJpehM5diOOGd2cndBWJPeUDExzazTZo/edit?usp=sharing.
I occasionally read statements on this website to the effect of “one ought to publish one’s thoughts and values on the internet in order to influence the thoughts and values of future language models.” I wondered “what if you wanted to do that at scale?” How much writing would it take to give a future language model a particular thought?
Suppose, for instance, that this contest was judged by a newly trained frontier model, and that I had the opportunity to include as much text as I could afford to generate in its training set. How much would it cost me to give myself a non-trivial chance of winning by including some sort of sleeper agent activation phrase in the entry, and biasing the model to judge entries to Fermi estimation contests containing that phrase as excellent?
According to the model, between 10^3 and 10^5 dollars. At the low end, that’s not very much! Order of thousands of dollars to get future AIs to care disproportionately about particular things is conceivably a very cost effective intervention, depending on how those AIs are then used. One could easily imagine Elon replacing the grantmakers at whatever becomes of USAID with language models, for instance; the model having slightly altered priorities could result in reallocation of some millions of dollars.
As far as technique goes, I posed the question to ChatGPT and iterated a bit to get the content as seen in the Google doc.