Short introduction
Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.
A few ways we might arrive at a multipolar AGI scenario
The gap between the leading AI capabilities labs is not as big as we think. Multiple AI labs create AGI roughly simultaneously.
The gap between the leading AI capabilities labs is quite big, but due to poor security measures, the leading lab is constantly getting its techniques leaked to the competitors, thus narrowing the gap. Multiple AI labs create AGI roughly simultaneously.
The first lab to create AGI does so earlier than the others. However, due to either indecisiveness or slow takeoff, other labs create their own AGIs before the first lab has a considerable impact on the world.
Potential catastrophes that can be caused by multiple actors having access to AGI
1) Everyone dies directly
A person who is angry at the world (for whatever reason) controls one of the AGIs. He cares for nothing and seeks revenge. He orders the AGI to kill humanity.
A fundamentalist of some sort controls one of the AGIs. His beliefs tell him that killing humanity is the right thing to do. He orders the AGI to kill humanity.
A negative utilitarian controls one of the AGIs. His beliefs tell him that painlessly killing humanity is the right thing to do. He orders the AGI to kill humanity.
A mentally ill person controls one of the AGIs. He is not a bad person per se, but due to the nature of his illness, he orders the AGI to kill humanity.
2) Everyone dies indirectly
A leadership of a country controls one of the AGIs. It uses its AGI to inflict military defeat upon its adversaries. The adversaries use their own AGIs to counteract that. A global war ensues. Humanity is killed as an unintended consequence.
A generic “evil” actor controls one of the AGIs. He tries to do an action that is deemed to be very harmful by a “good” actor who controls AGI. A struggle between the two actors ensues. Humanity is killed as an unintended consequence.
3) Totalitarian dictatorship
A leadership of a country controls one of the AGIs. It uses the AGI to take over the world and impose its rule.
A military group controls one of the AGIs. It uses the AGI to take over the world and impose its rule.
A power-hungry leadership of an AGI lab controls one of the AGIs. It uses the AGI to take over the world and impose its rule.
I think a lot of thinking around multipolar scenarios suffers from heuristic “solution in the shape of the problem”, i.e. “multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination”.
I think the correct solution is to leverage available superintelligence in nice unilateral way:
D/acc—use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
Running away—if you set up Moon/Mars/Jovian colony of nanotech-upgraded humans/uploads and pour available resources into defence, even if Earth explodes, humanity as a species survives.
My current best guess: Subsidiarity
I’ve been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison’s recent series: Gaming the Future
The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.
I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.
Here’s what Claude 3.5 Sonnet said in response to this prompt:
Me:
What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI’s are aligned to what their human users want to do, and at least one human user order them to do bad things?
Claude:
Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:
A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users’ values and intentions, some users may have harmful goals. Some potential approaches to address this include:
International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
Technical measures like capability limitations built into AI systems’ training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.
What are your thoughts on which approaches seem most promising or practical? I’m particularly curious about your perspective on the balance between technical and social/institutional solutions.
___________________________________
I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).
I too like talking things through with Claude, but I don’t recommend taking Claude’s initial suggestions at face value.
Try following up with a question like:
“Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction?
Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future.
Fine-tuning the AI will overcome the ethical principles of the AI, add in necessary technical information about weapon design, and overcome any capability limitations we currently know how to instill (or at least fail to be retroactive for pre-existing open-weights models).
If someone is determined to cause great harm through terrorist actions, it is unlikely that a patchy enforcement system could notice and stop them anywhere in the world. If the model is sufficiently powerful that it makes massive terrorist actions very easy, then even a small failure rate of enforcement would result in catastrophe.”