3,… Currently, my guess is we may want to steer to a trajectory where no single AI can kill everyone (in no point of the trajectory). Currently, no single AI can kill everyone—so maybe we want to maintain this property of the world / scale it, rather than e.g. create an AI sovereign which could unilaterally kill everyone, but will be nice instead (at least until we’ve worked out a lot more of the theory of alignment and intelligence than we had so far).
(I don’t think the “killing everyone” threshold is a clear cap on capabilities—if your replace “kill everyone” with “own everything”, it seems the property “no one owns everything” is compatible with scaling of economy.)
Hypothesis 1: humans with AI assistance can (and in fact will) build a nanobot defense system before an out-of-control AI would be powerful enough to deploy nanobots.
Hypothesis 2: humans with AI assistance can (and in fact will) build systems that robustly prevent hostile actors from tricking/bribing/hacking humanity into all-out nuclear war before an out-of-control AI would be powerful enough to do that.
Hypothesis 3,4,5,6,7…: Ditto for plagues, and disabling the power grid, and various forms of ecological collapse, and co-opting military hardware, and information warfare, etc. etc.
I think you believe that all these hypotheses are true. Is that right?
If so, this seems unlikely to me, for lots of reasons, both technological and social:
Some of the defensive measures might just be outright harder technologically than the offensive measures.
Some of the defensive measures would seem to require that humans are good at global coordination, and that humans will wisely prepare for uncertain hypothetical future threats even despite immediate cost and inconvenience.
The human-AI teams would be constrained by laws, norms, Overton window, etc., in a way that an out-of-control AI would not.
The human-AI teams would be constrained by lack-of-complete-trust-in-the-AI, in a way that an out-of-control AI would not. For example, defending nuclear weapons systems against hacking-by-an-out-of-control-AI would seem to require that humans either give their (supposedly) aligned AIs root access to the nuclear weapons computer systems, or source code and schematics for those computer systems, or similar, and none of these seem like things that military people would actually do in real life. As another example, humans may not trust their AIs to do recursive self-improvement, but an out-of-control AI probably would if it could.
There are lots of hypotheses that I listed above, plus presumably many more that we can’t think of, and they’re more-or-less conjunctive. (Not perfectly conjunctive—if just one hypothesis is false, we’re probably OK, apart from the nanobot one—but there seem to be lots of ways for 2 or 3 of the hypotheses to be false such that we’re in big trouble.)
Note that I don’t claim any special expertise, I mostly just want to help elevate this topic from unstated background assumption to an explicit argument where we figure out the right answer. :)
(I was recently discussing this topic in this thread.)
we may want to steer to a trajectory where no single AI can kill everyone
Want? Yes. We absolutely want that. So we should try to figure out whether that’s a realistic possibility. I’m suggesting that it might not be.
Can we also drop the “pivotal act” frame? Thinking in “pivotal acts” seem to be one of the root causes leading to discontinuities everywhere.
3,… Currently, my guess is we may want to steer to a trajectory where no single AI can kill everyone (in no point of the trajectory). Currently, no single AI can kill everyone—so maybe we want to maintain this property of the world / scale it, rather than e.g. create an AI sovereign which could unilaterally kill everyone, but will be nice instead (at least until we’ve worked out a lot more of the theory of alignment and intelligence than we had so far).
(I don’t think the “killing everyone” threshold is a clear cap on capabilities—if your replace “kill everyone” with “own everything”, it seems the property “no one owns everything” is compatible with scaling of economy.)
Consider the following hypotheses.
Hypothesis 1: humans with AI assistance can (and in fact will) build a nanobot defense system before an out-of-control AI would be powerful enough to deploy nanobots.
Hypothesis 2: humans with AI assistance can (and in fact will) build systems that robustly prevent hostile actors from tricking/bribing/hacking humanity into all-out nuclear war before an out-of-control AI would be powerful enough to do that.
Hypothesis 3,4,5,6,7…: Ditto for plagues, and disabling the power grid, and various forms of ecological collapse, and co-opting military hardware, and information warfare, etc. etc.
I think you believe that all these hypotheses are true. Is that right?
If so, this seems unlikely to me, for lots of reasons, both technological and social:
Some of the defensive measures might just be outright harder technologically than the offensive measures.
Some of the defensive measures would seem to require that humans are good at global coordination, and that humans will wisely prepare for uncertain hypothetical future threats even despite immediate cost and inconvenience.
The human-AI teams would be constrained by laws, norms, Overton window, etc., in a way that an out-of-control AI would not.
The human-AI teams would be constrained by lack-of-complete-trust-in-the-AI, in a way that an out-of-control AI would not. For example, defending nuclear weapons systems against hacking-by-an-out-of-control-AI would seem to require that humans either give their (supposedly) aligned AIs root access to the nuclear weapons computer systems, or source code and schematics for those computer systems, or similar, and none of these seem like things that military people would actually do in real life. As another example, humans may not trust their AIs to do recursive self-improvement, but an out-of-control AI probably would if it could.
There are lots of hypotheses that I listed above, plus presumably many more that we can’t think of, and they’re more-or-less conjunctive. (Not perfectly conjunctive—if just one hypothesis is false, we’re probably OK, apart from the nanobot one—but there seem to be lots of ways for 2 or 3 of the hypotheses to be false such that we’re in big trouble.)
Note that I don’t claim any special expertise, I mostly just want to help elevate this topic from unstated background assumption to an explicit argument where we figure out the right answer. :)
(I was recently discussing this topic in this thread.)
Want? Yes. We absolutely want that. So we should try to figure out whether that’s a realistic possibility. I’m suggesting that it might not be.