Thanks for your comments, quite relevant!
I had read Steve Byrnes’s post and yours (actually wrote some value-alignment-critical comments on yours), but overlooked them. Definitely agree they should both be in the literature, I’ll add them.
I also stumbled upon this blogpost arguing defense will win, one of the stronger ones in that direction I think, although I’m not quite convinced.
I’d like to do more work on this in the future. For example, different people seem to have very different models of how powerful peak AI will be (ASI? AGI+?), and very different existential threat models (paperclip maximizer? multipolar takeover after application? gradual disempowerment?). Maybe given a certain scenario, we can arrive at consensus about the offense/defense balance?
For now I guess we should just conclude that there’s no consensus and therefore we should take both options (offense wins and defense wins) into account?
otto.barten
Hi Ben, thanks for your comment. You’re right that we have used AI for this research. Since we’re actually arguing that AI can, and sometimes should, be used for defensive actions (such as AI safety research), trying this out ourselves (or working with researchers who do) is not off.
We believe our post does meet the lesswrong AI standards, specifically we have added significant value beyond what the AI produced (for a start, ~50% is not AI-assisted at all), we think our post contains important insights that could reduce existential risk and are not yet widely adopted by the AI safety community, and it therefore does meet a high quality standard. Also, we have spent more than the recommended time editing (some edits where quite heavy and we added many of our own insights), and we vouch for what’s written. AI did help us in writing parts of this post and in coming up with some good ideas we would otherwise not have had.
Still, it’s precisely the goal of our post to discuss how AI can and should be used against xrisk, so if you think this usage is inappropriate or counterproductive, that would also be a good discussion to have, and perhaps useful for others!
AI Offense Defense Balance in a Multipolar World
Computational ethics is false when applied to space colonization
It looks to me like this is a scenario where superhuman AI is intent-aligned. If that’s true, rainforests exist if humans prefer rainforests over mansions or superyachts or some other post-AGI luxury they could build from the same atoms. I’m afraid they won’t.
Agree about the celestial bodies. Can you explain what you mean by “but also the direction pointed by the market argument is not entirely without merit”, and why the cited paper is relevant?
I would be reasonably optimistic if we had a democratic world government (or perhaps a UN-intent-aligned ASI blocking all other ASI) that we’d decide to leave at least some rainforest and the sun in one piece. I’m worried about international competition between states though where it becomes practically impossible due to such competition to not destroy earth for stuff. Maybe Russia will in the end win because it holds the greatest territory. Or more likely: the winning AI/industrial nation will conquer the rest of the world and will transform their earth to stuff as well.
Maybe we should have international treaties limiting the amount of nature a nation may convert to stuff?
Climate change exists because doing something that’s bad for the world (carbon emission) is not priced. Climate change isn’t much worse than it is already because most people still can’t afford to live very climate unfriendly lives.
In this scenario, I’m mostly worried that without any constraints on what people can afford, not only might carbon emission go through the roof, but all other planetary boundaries that we know and don’t know yet might also be shattered. We could of course easily solve this problem by pricing externalities, which would not be very costly in an abundant world. Based on our track record, I just don’t think that we’ll do that.
Will we still have rainforest after the industrial explosion? Seems quite unlikely to me.
Appreciate your comment. Loss of control does make killing all humans easier, doesn’t it? Once someone/something has control (sovereignty) over a population, by definition, they can do whatever they want. For example, they could demand part of the population kills the other part, ask a (tiny) part of the population to create weapons (possibly for a bogus reason) and use them against the entire population, etc. etc. Even with low tech, it’s easy to kill off a population once you have control (sovereignty), this has been demonstrated at many historical genocides. With high tech, it becomes trivial. Note there’s no hurry: once we’ve lost control, this will likely remain the case, so an AI would have billions of years to carry out whatever plan they want to.
Yes RAND, AI Could Really Cause Human Extinction [crosspost]
Ah I wasn’t really referring to the OP, more to people who in general might blindly equate vague notions of whatever consciousness might mean to moral value. I think that’s an oversimplification and possibly dangerous. Combined with symmetric population ethics, a result could be that we’d need to push for spamming the universe with maximum happy AIs, and even replacing humanity with maximum happy AIs since they’d contain more happiness per kg or m3. I think that would be madness.
Animals: yes, some. Future AIs: possibly.
If I’d have to speculate, I’d guess that self-awareness is just included in any good world model, and sentience is a control feedback loop, in both humans and AIs. These two things together, perhaps in something like a global workspace, might make up what some people call consciousness. These things are obviously useful to steer machines into a designed direction. But I fear they will turn out to be trivial engineering results: one could argue an automatic vacuum cleaner has feeling, since it has a feedback loop steering it clear of a wall. That doesn’t mean it should have rights.
I think the morality question is a difficult one, will remain subjective, and we should vote on it, rather than try to solve it analytically. I think the latter is doomed.
I like this treatment of consciousness and morality so much better than the typical EA (and elsewhere) naive idea that anything that “has consciousness” suddenly “has moral value” (even worse, and dangerous, is to combine that with symmetric population ethics). We should treat these things carefully (and imo democratically) to avoid making giant mistakes once AI allows us to put ethics into practice.
This is a late comment, but extremely impressive work!
I’m a huge fan of explicit, well-argued threat model work, and even more impressive that you made great contributions to mitigation measures already. I find this threat model frankly seemingly more likely to become existential, and possibly at lower AI capability levels, than either yudkowsky/bostrom scenarios or christiano/gradual displacement ones. So seems hugely important!
A question: am I right that most of your analysis presumes that there would be a fair amount of oversight, at least oversight attempts? If so, I’d be afraid that the actual situation might be heavy deployment of agents without much oversight attempts at all (given both labs’ and govts’ safety track record so far). In such a scenario:
How likely do you think collusion attempts aiming for takeover would be?
Could you estimate what kind of capabilities would be needed for a multi-agent takeover?
Would you expect some kind of warning shot before a successful multi-agent takeover or not?
Maybe economic niche occupation requires colonizing the universe
US-China trade talks should pave way for AI safety treaty [SCMP crosspost]
For gradual disempowerment-like threat models, including AI 2027, it seems important that AIs can cooperate with 1) themselves, but at another time, and 2) other AIs. If we block such cross-inference communication, can’t we rule out this threat model?
Agree
I love this post, I think this is a fundamental issue for intent-alignment. I don’t think value-alignment or CEV are any better though, mostly because they seem irreversible to me, and I don’t trust the wisdom of those implementing them (no person is up to that task).
I agree it would be good to I implement these recommendations, although I also think they might prove insufficient. As you say, this could be a reason to pause that might be easier to grasp by the public than misalignment. (I think currently, the reason some do not support a pause is perceived lack of capabilities though, not (mostly) perceived lack of misalignment).
I’m also worried about a coup, but I’m perhaps even more worried about the fate of everyone not represented by those who will have control over the intent-aligned takeover-level AI (IATLAI). If IATLAI is controlled by e.g. a tech CEO, this includes almost everyone. If controlled by government, even if there is no coup, this includes everyone outside that country. Since control over the world of IATLAI could be complete (way more intrusive than today) and permanent (for >billions of years), I think there’s a serious risk that everyone outside the IATLAI country does not make it eventually. As a data point, we can see how much empathy we currently have for citizens from starving or war-torn countries. It should therefore be in the interest of everyone who is on the menu, rather than at the table, to prevent IATLAI from happening, if capabilities awareness would be present. This means at least the world minus the leading AI country.
The only IATLAI control that may be acceptable to me, could be UN-controlled. I’m quite surprised that every startup is now developing AGI, but not the UN. Perhaps they should.
I expected this comment, value alignment or CEV indeed doesn’t have the few-human coup disadvantage. It does however have other disadvantages. My biggest issue with both is that they seem irreversible. If your values or your specific CEV implementation turns out to be terrible for the world, you’re locked in and there’s no going back. Also, a value-aligned or CEV takeover-level AI would probably start straight away with a takeover, since else it can’t enforce its values in a world where many will always disagree. That takeover won’t exactly increase its popularity. I think a minimum requirement should be that a type of alignment is adjustable by humans, and intent-alignment is the only type that meets that requirement as far as I know.
Only one person, or perhaps a small, tight group, can succeed in this strategy though. The chance that that’s you is tiny. Alliances with someone you thought was on your side can easily break (case in point: EA/OAI).
It’s a better strategy to team up with everyone else and prevent the coup possibility.
Super interesting post! I’m agnostic on whether this will happen and when, but I have something to add to the what we should do-section.
You are basically only talking there about alignment-action on the new models. I think that would be good to do, but at the same time I’m sceptical about alignment as a solution. Reasons include that I’m uncertain about the offense-defense balance in a multipolar scenario and very sceptical that the goals we set for an ASI in a unipolar scenario will be good in the medium term (>10 yrs) (even if we solve technical alignment). I don’t think humanity is ready for having a god, even a steerable god. In addition, of course it could be that technical alignment does not get solved (in time), which is a more mainstream worry on LW.
Mostly for these reasons I put more trust in a regulatory approach. In this approach, we’d first need to inform the public (which is what I worked on the past four years), about the dangers of superintelligence (incl. human extinction), and then states would coordinate to arrive at global regulation (e.g. via our proposal, the Conditional AI Safety Treaty). By now, similar approaches are fairly mainstream in MIRI (because of technical alignment reasons), EA, FLI, PauseAI, and lots of other orgs. Hardware regulation is the most common way to enforce treaties, with sub-approaches such as FlexHEGs and HEMs.
If AGI would need a lot less flops, this would get a lot more difficult. I think it’s plausible that we arrive at this situation due to a new paradigm. Some say hardware regulation is not feasible at all anymore in such a case. I think it depends on the specifics: how many flops are needed, how much societal awareness do we have, which regulation is feasible?
I think that in addition to your what we should do-list, we should also:
Try our best to find out how many flops, how much memory, and how much money are/is needed for takeover-level AI (a probability distribution may be a sensible output).
For the most likely outcomes, figure out hardware regulation plans that would likely be able to pause/switch off development in case political support is available. (My org will work on this as well.)
Double down on FlexHEG/HEM hardware regulation options, while taking into account the scenario that a lot less flops/memory/money might be needed than previously expected.
Double down on increasing public awareness of xrisk.
Explore options beyond hardware regulation that might succeed in enforcing a pause/off switch for a longer time, while doing as little damage as possible.