RE Disagreement 5: Some examples where the aligned AIs will not consume the “free energy” of an out-of-control unaligned AI are:
1. Exploiting the free energy requires humans trusting the AIs more than they actually do. For example, humans with a (supposedly) aligned AGI may not trust the AGI to secure their own nuclear weapons systems, or to hack into its enemies’ nuclear weapons systems, or to do recursive self-improvement, or to launch von Neumann probes that can never be called back. But an out-of-control AGI would presumably be willing to do all those things.
2. Exploiting the free energy requires violating human laws, norms, Overton Windows, etc., or getting implausibly large numbers of human actors to agree with each other, or suffering large immediate costs for uncertain benefits, etc., such that humans don’t actually let their aligned AGIs do that. For example, maybe the only viable gray goo defense system consists of defensive nanobots that go proliferate in the biosphere, harming wildlife and violating national boundaries. Would people + aligned AGIs actually go and deploy that system? I’m skeptical. Likewise, if there’s a neat trick to melt all the non-whitelisted GPUs on the planet, I find it hard to imagine that people + aligned AGIs would actually do anything with that knowledge, or even that they would go looking for that knowledge in the first place. But an out-of-control unaligned AGI wouldn’t hesitate.
3. Exploiting the free energy accomplishes a goal that no human would want to accomplish, e.g. the removal of all oxygen from the atmosphere. Here, the attacking and defending AIs are trying to do two different things. Destroying a power grid may be much easier or much harder than preventing a power grid from being destroyed; a gray goo defense system may be much easier or much harder to create than gray goo, etc. I don’t have great knowledge about attack-defense balance in any of these domains, but I’m concerned by the disjunctive nature of the problem—an out-of-control AGI would presumably attack in whatever way had the worst attack-defense imbalance.
(Above is somewhat redundant with Paul’s strategy-stealing post; like Zvi I thought it was a nice post but I drew the opposite conclusion.)
Another way to state your second point—the only way to exploit that free energy may be through something that looks a lot like a ‘pivotal act’. And in your third point, there may be no acceptable way to exploit that free energy, in which case the only option is to prevent any equally-capable unaligned AI from existing—not necessarily through a pivotal act, but Eliezer argues that’s the only practical way to do so.
I think the existence/accessibility of these kinds of free energy (offense-favored domains whose exploitation is outside of the Overton window or catastrophic) this is a key crux for ‘pivotal act’ vs. gradual risk reduction strategies, plausibly the main one.
In the terms of Paul’s point #2 - this could still be irrelevant because earlier AI systems will have killed us in more boring ways, but the ‘radically advancing the state of human R&D’ branch may not meaningfully change our vulnerability. I think this motivates the ‘sudden doom’ story even if you predict a smooth increase in capabilities.
RE Disagreement 5: Some examples where the aligned AIs will not consume the “free energy” of an out-of-control unaligned AI are:
1. Exploiting the free energy requires humans trusting the AIs more than they actually do. For example, humans with a (supposedly) aligned AGI may not trust the AGI to secure their own nuclear weapons systems, or to hack into its enemies’ nuclear weapons systems, or to do recursive self-improvement, or to launch von Neumann probes that can never be called back. But an out-of-control AGI would presumably be willing to do all those things.
2. Exploiting the free energy requires violating human laws, norms, Overton Windows, etc., or getting implausibly large numbers of human actors to agree with each other, or suffering large immediate costs for uncertain benefits, etc., such that humans don’t actually let their aligned AGIs do that. For example, maybe the only viable gray goo defense system consists of defensive nanobots that go proliferate in the biosphere, harming wildlife and violating national boundaries. Would people + aligned AGIs actually go and deploy that system? I’m skeptical. Likewise, if there’s a neat trick to melt all the non-whitelisted GPUs on the planet, I find it hard to imagine that people + aligned AGIs would actually do anything with that knowledge, or even that they would go looking for that knowledge in the first place. But an out-of-control unaligned AGI wouldn’t hesitate.
3. Exploiting the free energy accomplishes a goal that no human would want to accomplish, e.g. the removal of all oxygen from the atmosphere. Here, the attacking and defending AIs are trying to do two different things. Destroying a power grid may be much easier or much harder than preventing a power grid from being destroyed; a gray goo defense system may be much easier or much harder to create than gray goo, etc. I don’t have great knowledge about attack-defense balance in any of these domains, but I’m concerned by the disjunctive nature of the problem—an out-of-control AGI would presumably attack in whatever way had the worst attack-defense imbalance.
(Above is somewhat redundant with Paul’s strategy-stealing post; like Zvi I thought it was a nice post but I drew the opposite conclusion.)
Seconding all of this.
Another way to state your second point—the only way to exploit that free energy may be through something that looks a lot like a ‘pivotal act’. And in your third point, there may be no acceptable way to exploit that free energy, in which case the only option is to prevent any equally-capable unaligned AI from existing—not necessarily through a pivotal act, but Eliezer argues that’s the only practical way to do so.
I think the existence/accessibility of these kinds of free energy (offense-favored domains whose exploitation is outside of the Overton window or catastrophic) this is a key crux for ‘pivotal act’ vs. gradual risk reduction strategies, plausibly the main one.
In the terms of Paul’s point #2 - this could still be irrelevant because earlier AI systems will have killed us in more boring ways, but the ‘radically advancing the state of human R&D’ branch may not meaningfully change our vulnerability. I think this motivates the ‘sudden doom’ story even if you predict a smooth increase in capabilities.