Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment
Twitter account: @MLaGrangienne
Tumblr account: @matricejacobine
Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment
Twitter account: @MLaGrangienne
Tumblr account: @matricejacobine
@nostalgebraist @Mantas Mazeika “I think this conversation is taking an adversarial tone.” If this is how the conversation is going this might be the case to end it and work on a, well, adversarial collaboration outside the forum.
Would you mind to cross-post this on the EA Forum?
It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper’s framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk those results) the most important discoveries of the paper from my point of view, that LLMs have utility functions over world-states which are 1/ consistent across LLMs, 2/ more and more consistent as model size increase, and 3/ can be subject to mechanical interpretability methods, remain the same.
… I don’t agree, but would it at least be relevant that the “soft CCP-approved platitudes” are now AI-safetyist?
So that answer your question “Why does the linked article merit our attention?” right?
Why does the linked article merit our attention?
It is written by a Chinese former politician in a Chinese-owned newspaper.
?
I’m not convinced “almost all sentient beings on Earth” would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).
The most important part of the experimental setup is “unconstrained text response”. If in the largest LLMs 60% of unconstrained text responses wind up being “the outcome it assigns the highest utility”, then that’s surely evidence for “utility maximization” and even “the paperclip hyper-optimization caricature”. What more do you want exactly?
This doesn’t contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don’t need to eyeball based on a few examples in a Twitter thread on a single factor.
If that was the case we wouldn’t expect to have those results about the VNM consistency of such preferences.
There’s a more complicated model but the bottom line is still questions along the lines of “Ask GPT-4o whether it prefers N people of nationality X vs. M people of nationality Y” (per your own quote). Your questions would be confounded by deontological considerations (see section 6.5 and figure 19).
The outputs being shaped by cardinal utilities and not just consistent ordinal utilities would be covered in the “Expected Utility Property” section, if that’s your question.
I don’t see why it should improve faster. It’s generally held that the increase in interpretability in larger models is due to larger models having better representations (that’s why we prefer larger models in the first place), why should it be any different in scale for normative representations?
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in “Internal Utility Representations” being also correlated with model size.
I think this is largely duplicating Uncle Kenny’s already excellent work (linked in the initial thread) and not a good idea.
I’m using the 2016 survey and counting non-binary yes.