Matrice Jacobine

Karma: 382

Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment

Matrice Jacobine Apr 17, 2025, 7:38 PM
1 point
0
in reply to: Chris_Leong’s comment on: ASI existential risk: Reconsidering Alignment as a Goal
Interesting analysis, but this statement is a bit strong. A global safe AI project would be theoretically possible, but would be extremely challenging to solve the co-ordination issues without AI progress dramatically slowing. Then again, all plans are challenging/potentially impossible.
[...]
Another option would be to negotiate a deal where only a few countries are allowed to develop AGI, but in exchange, the UN gets to send observers and provide input on the development of the technology.
“co-ordination issues” is a major euphemism here: such a global safe AI would not just require the same kind of coordination one generally expect in relations between nation-states (even in the eyes of the most idealistic liberal-internationalists), but effectively having already achieved a world government and species-wide agreement on a same moral philosophy – which may itself require having already achieved at the very least a post-scarcity economy. This is more or less what I mean in the last bullet point by “and only then (possibly) building aligned ASI”.
Alternatively, an aligned ASI could be explicitly instructed to preserve existing institutions. Perhaps it’d be limited to providing advice, or, strongly, it wouldn’t intervene except by preventing existential or near-existential risks.
Depending on whether this advice is available to everyone or only to the leadership of existing institutions, this would fall either under Tool AI (which is one of the approaches in my third bullet point) or state-aligned (but CEV-unaligned) ASI (a known x-risk and plausibly a s-risk).
Yet another possibility is that the world splits into factions which produce their own AGI’s and then these AGIs merge.
If the merged AGIs are all CEV-unaligned, I don’t see why we should assume that, just because it is a merger from AGIs from across the world, the merged AGI would suddenly be CEV-aligned.

Matrice Jacobine Apr 17, 2025, 3:30 PM
9 points
2
in reply to: Chris_Leong’s comment on: ASI existential risk: Reconsidering Alignment as a Goal
An aligned ASI, if it were possible, would be capable of a degree of perfection beyond that of human institutions.
The corollary of this is that an aligned ASI in the strong sense of “aligned” used here would have to dissolve currently existing human institutions, and the latter will obviously oppose that. As it stand, even if we solve technical alignment (which I do think is plausible at this rate), we’ll end up with either an ASI aligned to a nation-state, or a corporate ASI turning all available matter in economium, both of which are x-risks in the longtermist sense (and maybe even s-risks e.g. in the former case if Xi or Trump are bioconservative and speciesist, which I’m fairly sure they are).
As Yudkowsky wrote nearly 18 years ago in Applause Lights:
Suppose that a group of democratic republics form a consortium to develop AI, and there’s a lot of politicking during the process—some interest groups have unusually large influence, others get shafted—in other words, the result looks just like the products of modern democracies. Alternatively, suppose a group of rebel nerds develops an AI in their basement, and instructs the AI to poll everyone in the world—dropping cellphones to anyone who doesn’t have them—and do whatever the majority says. Which of these do you think is more “democratic,” and would you feel safe with either?
To re-use @Richard_Ngo’s framework of the three waves of AI safety, the first generation around SIAI/MIRI had a tendency to believe that that creating AGI was mostly an engineering problem and dismissed the lines of thought that predicted modern scaling laws. So the idea of being that “group of rebel nerds” and creating Friendly AGI in your basement (which was ostensibly SIAI/MIRI’s goal) could have seemed realistic to them back then.
Then the deep learning revolution of the 2010s happened and it turned out that the first wave of AI safety was wrong and the bottleneck to AGI really is access to large amounts of compute, which you can only get through financial backing by corporations (for DeepMind, Google; for OpenAI, Microsoft; for Anthropic, Amazon), and which is easy for the state to clamp down on.
And then the AI boom of the 2020s happened and the states themselves are now more and more conscious of the threat of AGI. Applause Lights was wrote in 2007. I would predict that by 2027 a private project with the explicit goal of developing AGI to overthrow all existing national governments would receive about the same public reaction as a private project with the explicit goal of developing nuclear weapons to overthrow all existing national governments.
For the third wave of AI safety (quoting Ngo again), there are different ways you can go from this:
- Push for your preferred state or corporation to achieve aligned (in the weak sense) ASI, thus trusting them with the entire long-term future of humanity
- Wholly embrace that you have a comic-book super-villain plan to take over the world and prepare for state repression
- The more realistic golden mean between those two plans: develop artificial intelligence in a decentralized way in a way that this still let us first achieve a post-scarcity economy, longtermist institutional reform, possibly a long reflection, and only then (possibly) building aligned ASI. This is the common throughline between differential technological development, d/acc, coceleration, Tool AI, organic alignment, mutual alignment, cyborgism, and other related ideas to me.

Matrice Jacobine Apr 17, 2025, 12:21 PM
1 point
−1
in reply to: Jonas Hallgren’s comment on: ASI existential risk: Reconsidering Alignment as a Goal
Oh wow that’s surprising, I thought Ted Chiang was an AI skeptic not too long ago?

“Long” timelines to advanced AI have gotten crazy short

Matrice JacobineApr 3, 2025, 10:46 PM

21 points

0 comments LW link

(helentoner.substack.com)

Large Language Models Pass the Turing Test

Matrice JacobineApr 2, 2025, 5:41 AM

6 points

0 comments1 min readLW link

(arxiv.org)

Matrice Jacobine Mar 5, 2025, 1:40 PM
1 point
0
in reply to: Lukas Finnveden’s comment on: Thread for Sense-Making on Recent Murders and How to Sanely Respond
I’m using the 2016 survey and counting non-binary yes.

Matrice Jacobine Mar 4, 2025, 10:21 PM
3 points
2
in reply to: Mantas Mazeika’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
@nostalgebraist @Mantas Mazeika “I think this conversation is taking an adversarial tone.” If this is how the conversation is going this might be the case to end it and work on a, well, adversarial collaboration outside the forum.

Matrice Jacobine Feb 25, 2025, 6:45 PM
4 points
0
on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Would you mind to cross-post this on the EA Forum?

Demonstrating specification gaming in reasoning models

Matrice JacobineFeb 20, 2025, 7:26 PM

4 points

0 comments1 min readLW link

(arxiv.org)

US AI Safety Institute will be ‘gutted,’ Axios reports

Matrice JacobineFeb 20, 2025, 2:40 PM

11 points

1 comment LW link

(www.zdnet.com)

Matrice Jacobine Feb 18, 2025, 1:02 PM
5 points
2
in reply to: nostalgebraist’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper’s framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk those results) the most important discoveries of the paper from my point of view, that LLMs have utility functions over world-states which are 1/ consistent across LLMs, 2/ more and more consistent as model size increase, and 3/ can be subject to mechanical interpretability methods, remain the same.

Matrice Jacobine Feb 18, 2025, 2:48 AM
2 points
−8
in reply to: Richard_Kennaway’s comment on: Cooperation for AI safety must transcend geopolitical interference
… I don’t agree, but would it at least be relevant that the “soft CCP-approved platitudes” are now AI-safetyist?

Matrice Jacobine Feb 17, 2025, 6:19 PM
1 point
0
in reply to: Richard_Kennaway’s comment on: Cooperation for AI safety must transcend geopolitical interference
So that answer your question “Why does the linked article merit our attention?” right?

Matrice Jacobine Feb 16, 2025, 10:07 PM
2 points
4
in reply to: Richard_Kennaway’s comment on: Cooperation for AI safety must transcend geopolitical interference
Why does the linked article merit our attention?
- It is written by a Chinese former politician in a Chinese-owned newspaper.
?

Cooperation for AI safety must transcend geopolitical interference

Matrice JacobineFeb 16, 2025, 6:18 PM

7 points

6 comments LW link

(www.scmp.com)

The current AI strategic landscape: one bear’s perspective

Matrice JacobineFeb 15, 2025, 9:49 AM

11 points

0 comments LW link

(philosophybear.substack.com)

Matrice Jacobine Feb 14, 2025, 1:08 PM
1 point
2
in reply to: Archimedes’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
I’m not convinced “almost all sentient beings on Earth” would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).

Matrice Jacobine Feb 13, 2025, 10:21 AM
3 points
0
in reply to: Archimedes’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
The most important part of the experimental setup is “unconstrained text response”. If in the largest LLMs 60% of unconstrained text responses wind up being “the outcome it assigns the highest utility”, then that’s surely evidence for “utility maximization” and even “the paperclip hyper-optimization caricature”. What more do you want exactly?

Matrice Jacobine Feb 12, 2025, 9:05 PM
17 points
7
in reply to: Rauno Arike’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
This doesn’t contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don’t need to eyeball based on a few examples in a Twitter thread on a single factor.

Matrice Jacobine Feb 12, 2025, 6:57 PM
22 points
9
in reply to: Kaj_Sotala’s comment on: Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
If that was the case we wouldn’t expect to have those results about the VNM consistency of such preferences.

Matrice Jacobine

“Long” timelines to ad­vanced AI have got­ten crazy short

Large Lan­guage Models Pass the Tur­ing Test

De­mon­strat­ing speci­fi­ca­tion gam­ing in rea­son­ing models

US AI Safety In­sti­tute will be ‘gut­ted,’ Ax­ios reports

Co­op­er­a­tion for AI safety must tran­scend geopoli­ti­cal interference

The cur­rent AI strate­gic land­scape: one bear’s perspective