Steven Byrnes comments on Continuity Assumptions

Steven Byrnes Jun 14, 2022, 12:55 PM
LW: 4 AF: 2
AF
Consider the following hypotheses.
- Hypothesis 1: humans with AI assistance can (and in fact will) build a nanobot defense system before an out-of-control AI would be powerful enough to deploy nanobots.
- Hypothesis 2: humans with AI assistance can (and in fact will) build systems that robustly prevent hostile actors from tricking/bribing/hacking humanity into all-out nuclear war before an out-of-control AI would be powerful enough to do that.
- Hypothesis 3,4,5,6,7…: Ditto for plagues, and disabling the power grid, and various forms of ecological collapse, and co-opting military hardware, and information warfare, etc. etc.
I think you believe that all these hypotheses are true. Is that right?
If so, this seems unlikely to me, for lots of reasons, both technological and social:
- Some of the defensive measures might just be outright harder technologically than the offensive measures.
- Some of the defensive measures would seem to require that humans are good at global coordination, and that humans will wisely prepare for uncertain hypothetical future threats even despite immediate cost and inconvenience.
- The human-AI teams would be constrained by laws, norms, Overton window, etc., in a way that an out-of-control AI would not.
- The human-AI teams would be constrained by lack-of-complete-trust-in-the-AI, in a way that an out-of-control AI would not. For example, defending nuclear weapons systems against hacking-by-an-out-of-control-AI would seem to require that humans either give their (supposedly) aligned AIs root access to the nuclear weapons computer systems, or source code and schematics for those computer systems, or similar, and none of these seem like things that military people would actually do in real life. As another example, humans may not trust their AIs to do recursive self-improvement, but an out-of-control AI probably would if it could.
- There are lots of hypotheses that I listed above, plus presumably many more that we can’t think of, and they’re more-or-less conjunctive. (Not perfectly conjunctive—if just one hypothesis is false, we’re probably OK, apart from the nanobot one—but there seem to be lots of ways for 2 or 3 of the hypotheses to be false such that we’re in big trouble.)
Note that I don’t claim any special expertise, I mostly just want to help elevate this topic from unstated background assumption to an explicit argument where we figure out the right answer. :)
(I was recently discussing this topic in this thread.)
we may want to steer to a trajectory where no single AI can kill everyone
Want? Yes. We absolutely want that. So we should try to figure out whether that’s a realistic possibility. I’m suggesting that it might not be.