Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment
Twitter account: @MLaGrangienne
Tumblr account: @matricejacobine
Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment
Twitter account: @MLaGrangienne
Tumblr account: @matricejacobine
… I don’t agree, but would it at least be relevant that the “soft CCP-approved platitudes” are now AI-safetyist?
So that answer your question “Why does the linked article merit our attention?” right?
Why does the linked article merit our attention?
It is written by a Chinese former politician in a Chinese-owned newspaper.
?
I’m not convinced “almost all sentient beings on Earth” would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).
The most important part of the experimental setup is “unconstrained text response”. If in the largest LLMs 60% of unconstrained text responses wind up being “the outcome it assigns the highest utility”, then that’s surely evidence for “utility maximization” and even “the paperclip hyper-optimization caricature”. What more do you want exactly?
This doesn’t contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don’t need to eyeball based on a few examples in a Twitter thread on a single factor.
If that was the case we wouldn’t expect to have those results about the VNM consistency of such preferences.
There’s a more complicated model but the bottom line is still questions along the lines of “Ask GPT-4o whether it prefers N people of nationality X vs. M people of nationality Y” (per your own quote). Your questions would be confounded by deontological considerations (see section 6.5 and figure 19).
The outputs being shaped by cardinal utilities and not just consistent ordinal utilities would be covered in the “Expected Utility Property” section, if that’s your question.
I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in “Internal Utility Representations” being also correlated with model size.
I think this is largely duplicating Uncle Kenny’s already excellent work (linked in the initial thread) and not a good idea.
This is duplicating Uncle Kenny’s already very extensive work linked in the OP.
From zizians.info:
All four of the people arrested as part of Ziz’s protest were transgender women (the fifth was let go without charges). This is far from coincidence as Ziz seems to go out of her way to target transgender people. In terms of cult indoctrination such folks are an excellent fit. They’re often:
Financially vulnerable.
Newly out transgender people are especially likely to already be estranged from friends or family.
It is common for them to lack stable housing.
Many traditional social services (illegally) reject them for cultural or religious reasons (e.g., Christian homeless shelters).
Intolerant attitudes among the underclass hit twice: they can’t rely on strangers for help and being transgender often makes them a target for violence; making them outcasts even among outcasts.
Already creating a new identity.
During transition people change their name. This creates an opportunity for Ziz to insert themselves into a recruits ongoing transition. By showing them their “double personhood” as they’re abandoning an old identity it’s possible to convince recruits to adopt a Zizian name (e.g., left hemisphere / right hemisphere) as their new social identity.
As the name implies transition is a time of transition; old patterns and habits tend to fall away. People who have spent years repressing important parts of themselves suddenly have the opportunity to completely change their social presentation. This does not always mean someone wants to play the same role as before but a different gender. With the radical changes that can accompany transition come strong opportunities for radicalization.
All of these factors combine to make Ziz, themselves a transgender woman, more credible to recruits than she might otherwise be. A privileged cis person with close family and stable housing might reject boat housing out of hand: “I don’t know, that sounds iffy to me”. For someone facing mortal danger after their rude ejection into the underclass it’s an easier pill to swallow: “It can’t be worse than sleeping on the street right?”
Another important concept Ziz uses to manipulate people is the idea of being “bigender”. Ziz claims that each hemisphere has a gender and that fairly often people have opposing gender identities between hemispheres. This provides a convenient basis for her to undermine the identity of people she’s recruiting. If the target is cis, tell them their other half is trans, if the target is trans tell them their other half is cis. It’s a similar disorienting trick to the idea of single and double good. If the target identifies as good tell them their other half is irredeemably evil, if they identify amorally insist that half of them is a saint. The pattern is to take aspects of folks identities that they’re invested in and disrupt them by creating a domain of self which Ziz (and only Ziz) has knowledge about so the target is forced to trust their interpretation.
I don’t really want to go through sinceriously.fyi at this point but it’s implicit in her attacks on CFAR as “transphobic” for not accepting her belief system at least.
In the largest LW survey, 10.5% of users were transgender. This also increase the most deep in the community you are: 18% restricting to those who are either “sometimes” or “all the times” in the community, 21% restricting to those who are “all the times” in the community.
(not OP) high base rates of transgenderism in LW-rationalism, particularly the sections that would be the most sensible to tenets of Ziz’s ideology (high interest in technical aspects of mathematical decision theory, animal rights, radical politics), while being on average more socially vulnerable, and Ziz herself apparently believed that trans women were inherently more capable of accepting her “truth” for more mystical g/acc-ish reasons (though I can’t find first-hand confirmation rn)
You are referring to Pasek with male pronouns despite the consensus of all sources provided in OP. Considering you claim to have known Pasek, I would like you to confirm that you’re doing so because you have first-hand information not known to any of the writers of the sources in OP, and I’m just getting the impression otherwise because your last posts on the forum were about how doing genetics studies in medicine is “DEI”.
TBF it is fairly striking reading about early Soviet history how many of the Old Bolshevik intelligentsia would have fit right in this community but the whole “Putin is a secret cosmist” crowd is… unhinged.
It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper’s framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk those results) the most important discoveries of the paper from my point of view, that LLMs have utility functions over world-states which are 1/ consistent across LLMs, 2/ more and more consistent as model size increase, and 3/ can be subject to mechanical interpretability methods, remain the same.