The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Now to 2.2 & 2.3.
The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won’t give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the defaultcase.
My lower-bound model of “how a sufficiently powerful intelligence would kill everyone, if it didn’t want to not do that” is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said “Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn’t already have planet-sized supercomputers?” but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth’s atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as “everybody on the face of the Earth suddenly falls over dead within the same second”.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don’t think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I’m also happy to briefly reply on the object level on this particular narrow point:
In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with.
As a concrete example, I think it’s fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can’t be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.
As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don’t think that will be the case.
Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.
This picture you describe is coherent. But I don’t read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism (“incrementally and predictably”) in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don’t have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth.
Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from “being able to automate very few jobs” to “can suddenly automate 100s of different jobs” in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached “incrementally and predictably”.
The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of “ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence”. There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone to obey it the way a religious devotee relates to their prophet, or perhaps a system will get access to a whole country’s google docs and personal computers and security recording systems and be able to think about all of this in parallel in a way no state actor is able to, and go on to blackmail a whole string of relevant people in order to get control of a lot of explosives or nuclear weapons and use it to blackmail a country to do its bidding.
I repeat the lack of a theory of capability gains with respect to investment (including AI-assisted investment) means that astronomical differentials may be on-track to surprise us, far more than how GPT-2 and GPT-3 surprised most people in terms of being able to actually write at a human level. The nanotech example is an extreme example of how decisively that can play out.
Responding to bullet 2.
First to 2.1.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Now to 2.2 & 2.3.
The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won’t give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the default case.
See Point 2 in AGI Ruin: A List of Lethalities for an example of this.
Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don’t think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I’m also happy to briefly reply on the object level on this particular narrow point:
In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with.
As a concrete example, I think it’s fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can’t be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.
As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don’t think that will be the case.
Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.
This picture you describe is coherent. But I don’t read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism (“incrementally and predictably”) in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don’t have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth.
Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from “being able to automate very few jobs” to “can suddenly automate 100s of different jobs” in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached “incrementally and predictably”.
The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of “ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence”. There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone to obey it the way a religious devotee relates to their prophet, or perhaps a system will get access to a whole country’s google docs and personal computers and security recording systems and be able to think about all of this in parallel in a way no state actor is able to, and go on to blackmail a whole string of relevant people in order to get control of a lot of explosives or nuclear weapons and use it to blackmail a country to do its bidding.
I repeat the lack of a theory of capability gains with respect to investment (including AI-assisted investment) means that astronomical differentials may be on-track to surprise us, far more than how GPT-2 and GPT-3 surprised most people in terms of being able to actually write at a human level. The nanotech example is an extreme example of how decisively that can play out.