I think the arguments in this post are an okay defense of “ASI wouldn’t spare humanity because of trade”
I disagree, and I’d appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):
Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
Superintelligences will “go hard enough” in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.
I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage says, andbecause (2) does not talk at all about what humans could have to offer in the future that might be worth trading for.
In my view, one of the primary cruxes of the discussion is whether trade is less efficient than going to war between agents with dramatically different levels of power. A thoughtful discussion could have started about the conditions under which trade usefully occurs, and the ways in which future AIs will be similar to and different from these existing analogies. For example, the post could have talked about why nation-states trade with each other even in the presence of large differences in military power, but humans don’t trade with animals. However, the post included no such discussion, choosing instead to attack a “midwit” strawman.
Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all. I don’t think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn’t actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I’ve seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.
In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to pay the costs of violence to obtain them? And why are we assuming that “there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all”?
Satisfying-sounding answers to each of these questions could undoubtedly be given, and I assume you can provide them. I don’t expect to find the answers fully persuasive, but regardless of what you think on the object-level, my basic meta-point stands: none of this stuff is obvious, and the essay is extremely weak without the added details that back up its background assumptions. It is very important to try to be truth-seeking and rigorously evaluate arguments on their merits. The fact that this essay is vague, and barely attempts to make a serious argument for one of its central claims, makes it much more difficult to evaluate concretely.
Two reasonable people could read this essay and come away with two very different ideas about what the essay is even trying to argue, given how much unstated inference you’re meant to “fill in”, instead of plain text that you can read. This is a problem, even if you agree with the underlying thesis the essay is supposed to argue for.
Edit: a substantial part of my objection is to this:
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn’t actually back up the thesis that people are interpreting him as arguing for.
It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details. (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans. If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)
Separately, it’s not clear to me whether you yourself could fill in those details. In other words, are you asking for those details to be filled in because you actually don’t know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it’d be better for the public discourse if all of Eliezer’s essays included that level of detail)?
Original comment:
The essay is a local objection to a specific bad argument, which, yes, is more compelling if you’re familiar with Eliezer’s other beliefs on the subject. Eliezer has written about those beliefs fairly extensively, and much of his writing was answering various other objections (including many of those you listed). There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you’re looking for. (There exist the Sequences, which are over a million words, but they while they implicitly answer many of these objections, they’re not structured to be a direct argument to this effect.)
As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them?
I think it would be much cheaper for you to describe a situation where an ASI develops the kind of nanotech that’d grant it technological self-sufficiency (and the ability to kill all humans), and it remains the case that trading with humans for any longer than it takes to bootstrap that nanotech is cheaper than just doing its own thing, while still being compatible with Eliezer’s model of the world. I have no idea what kind of reasoning or justification you would find compelling as an argument for “cheaper to disassemble”; it seems to require very little additional justification conditioning on that kind of nanotech being realized. My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can’t compensate for, or develop alternatives for, somehow).
There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you’re looking for.
To be clear, I am not objecting to the length of his essay. It’s OK to be brief.
I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.
This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.
My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can’t compensate for, or develop alternatives for, somehow).
I think neither of those things, and I entirely reject the argument that AIs will be fundamentally limited in the future in the way you suggested. If you are curious about why I think AIs will plausibly peacefully trade with humans in the future, rather than disassembling humans for their atoms, I would instead point to the facts that:
Trying to disassemble someone for their atoms is typically something the person will try to fight very hard against, if they become aware of your intentions to disassemble them.
Therefore, the cost of attempting to disassemble someone for their atoms does not merely include the technical costs associated with actually disassembling them, but additionally includes: (1) fighting the person who you are trying to kill and disassemble, (2) fighting whatever norms and legal structures are in place to prevent this type of predation against other agents in the world, and (3) the indirect cost of becoming the type of agent who predates on another person in this manner, which could make you an untrustworthy and violent person in the eyes of other agents, including other AIs who might fear you.
The benefit of disassembling a human is quite small, given the abundance of raw materials that substitute almost perfectly for the atoms that you can get from a human.
A rational agent will typically only do something if the benefits of the action outweigh the costs, rather than merely because the costs are small. Even if the costs of disassembling a human (as identified in point (2)) are small, that fact alone does not imply that a rational superintelligent AI would take such an action, precisely because the benefits of that action could be even smaller. And as just stated, we have good reasons to think that the benefits of disassembling a human are quite small in an absolute sense.
Therefore, it seems unlikely, or at least seems non-obvious, that a rational agent—even a very powerful one with access to advanced nanotech—will try to disassemble humans for their atoms.
Nothing in this argument is premised on the idea that AIs will be weak, less intelligent than humans, bounded in their goals, or limited in some other respect, except I suppose to the extent I’m assuming that AIs will be subject to environmental constraints, as opposed to instantly being able to achieve all of their goals at literally zero costs. I think AIs, like all physical beings, will exist in a universe in which they cannot get literally everything they want, and achieve the exact optimum of their utility function without any need to negotiate with anyone else. In other words, even if AIs are very powerful, I still think it may be beneficial for them to compromise with other agents in the world, including the humans, who are comparatively much less powerful than they are.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Now to 2.2 & 2.3.
The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won’t give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the defaultcase.
My lower-bound model of “how a sufficiently powerful intelligence would kill everyone, if it didn’t want to not do that” is that it gets access to the Internet, emails some DNA sequences to any of the many many online firms that will take a DNA sequence in the email and ship you back proteins, and bribes/persuades some human who has no idea they’re dealing with an AGI to mix proteins in a beaker, which then form a first-stage nanofactory which can build the actual nanomachinery. (Back when I was first deploying this visualization, the wise-sounding critics said “Ah, but how do you know even a superintelligence could solve the protein folding problem, if it didn’t already have planet-sized supercomputers?” but one hears less of this after the advent of AlphaFold 2, for some odd reason.) The nanomachinery builds diamondoid bacteria, that replicate with solar power and atmospheric CHON, maybe aggregate into some miniature rockets or jets so they can ride the jetstream to spread across the Earth’s atmosphere, get into human bloodstreams and hide, strike on a timer. Losing a conflict with a high-powered cognitive system looks at least as deadly as “everybody on the face of the Earth suddenly falls over dead within the same second”.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don’t think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I’m also happy to briefly reply on the object level on this particular narrow point:
In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with.
As a concrete example, I think it’s fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can’t be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.
As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don’t think that will be the case.
Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.
This picture you describe is coherent. But I don’t read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism (“incrementally and predictably”) in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don’t have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth.
Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from “being able to automate very few jobs” to “can suddenly automate 100s of different jobs” in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached “incrementally and predictably”.
The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of “ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence”. There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone to obey it the way a religious devotee relates to their prophet, or perhaps a system will get access to a whole country’s google docs and personal computers and security recording systems and be able to think about all of this in parallel in a way no state actor is able to, and go on to blackmail a whole string of relevant people in order to get control of a lot of explosives or nuclear weapons and use it to blackmail a country to do its bidding.
I repeat the lack of a theory of capability gains with respect to investment (including AI-assisted investment) means that astronomical differentials may be on-track to surprise us, far more than how GPT-2 and GPT-3 surprised most people in terms of being able to actually write at a human level. The nanotech example is an extreme example of how decisively that can play out.
I think maybe I derailed the conversation by saying “disassemble”, when really “kill” is all that’s required for the argument to go through. I don’t know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the “killing” part, but in this world I do not expect there to be a fight. I don’t think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.
I don’t know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the “killing” part, but in this world I do not expect there to be a fight.
The additional costs of human resistance don’t need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail.
It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica.
What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us.
To the extent your statement that “I don’t expect there to be a fight” means that you don’t think humans can realistically resist in any way that imposes costs on AIs, that’s essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at “zero costs”.
Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn’t cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.
Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.
Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn’t cost anything, then yes, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.
See Ben’s comment for why the level of nanotech we’re talking about implies a cost of approximately zero.
I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in “additional star systems colonized before they disappear outside the lightcone via universe expansion”. So the benefits are not trivial either.
I disagree, and I’d appreciate if someone would precisely identify the argument they found compelling in this post that argues for that exact thesis. As far as I can tell, the post makes the following supporting arguments for its claims (summarized):
Asking an unaligned superintelligence to spare humans is like asking Bernard Arnalt to donate $77 to you.
The law of comparative advantage does not imply that superintelligences will necessarily pay a high price for what humans have to offer, because of the existence of alternative ways for a superintelligence to get what it wants.
Superintelligences will “go hard enough” in the sense of using all reachable resources, rather than utilizing only some resources in the solar system and then stopping.
I claim that any actual argument for the proposition — that future unaligned AIs will not spare humanity because of trade — is missing from this post. The closest the post comes to arguing for this proposition is (2), but (2) does not demonstrate the proposition, both because (2) is only a claim about what the law of comparative advantage says, and because (2) does not talk at all about what humans could have to offer in the future that might be worth trading for.
In my view, one of the primary cruxes of the discussion is whether trade is less efficient than going to war between agents with dramatically different levels of power. A thoughtful discussion could have started about the conditions under which trade usefully occurs, and the ways in which future AIs will be similar to and different from these existing analogies. For example, the post could have talked about why nation-states trade with each other even in the presence of large differences in military power, but humans don’t trade with animals. However, the post included no such discussion, choosing instead to attack a “midwit” strawman.
Ok, but you can trivially fill in the rest of it, which is that Eliezer expects ASI to develop technology which makes it cheaper to ignore and/or disassemble humans than to trade with them (nanotech), and that there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all. I don’t think discussion of when and why nation-states go to war with each other is particularly illuminating given the threat model.
If it is possible to trivially fill in the rest of his argument, then I think it is better for him to post that, instead of posting something that needs to be filled-in, and which doesn’t actually back up the thesis that people are interpreting him as arguing for. Precision is a virtue, and I’ve seen very few essays that actually provide this point about trade explicitly, as opposed to essays that perhaps vaguely allude to the points you have given, as this one apparently does too.
In my opinion, your filled-in argument seems to be a great example of why precision is necessary: to my eye, it contains bald assertions and unjustified inferences about a highly speculative topic, in a way that barely recognizes the degree of uncertainty we have about this domain. As a starting point, why does nanotech imply that it will be cheaper to disassemble humans than to trade with them? Are we assuming that humans cannot fight back against being disassembled, and moreover, is the threat of fighting back being factored into the cost-benefit analysis when the AIs are deciding whether to disassemble humans for their atoms vs. trade with them? Are our atoms really that valuable that it is worth it to pay the costs of violence to obtain them? And why are we assuming that “there will not be other AIs around at the time which 1) would be valuable trade partners for the AI that develops that technology (which gives it that decisive strategic advantage over everyone else) and 2) care about humans at all”?
Satisfying-sounding answers to each of these questions could undoubtedly be given, and I assume you can provide them. I don’t expect to find the answers fully persuasive, but regardless of what you think on the object-level, my basic meta-point stands: none of this stuff is obvious, and the essay is extremely weak without the added details that back up its background assumptions. It is very important to try to be truth-seeking and rigorously evaluate arguments on their merits. The fact that this essay is vague, and barely attempts to make a serious argument for one of its central claims, makes it much more difficult to evaluate concretely.
Two reasonable people could read this essay and come away with two very different ideas about what the essay is even trying to argue, given how much unstated inference you’re meant to “fill in”, instead of plain text that you can read. This is a problem, even if you agree with the underlying thesis the essay is supposed to argue for.
Edit: a substantial part of my objection is to this:
It is not worth always worth doing a three-month research project to fill in many details that you have already written up elsewhere in order to locally refute a bad argument that does not depend on those details. (The current post does locally refute several bad arguments, including that the law of comparative advantage means it must always be more advantageous to trade with humans. If you understand it to be making a much broader argument than that, I think that is the wrong understanding.)
Separately, it’s not clear to me whether you yourself could fill in those details. In other words, are you asking for those details to be filled in because you actually don’t know how Eliezer would fill them in, or because you have some other reason for asking for that additional labor (i.e. you think it’d be better for the public discourse if all of Eliezer’s essays included that level of detail)?
Original comment:
The essay is a local objection to a specific bad argument, which, yes, is more compelling if you’re familiar with Eliezer’s other beliefs on the subject. Eliezer has written about those beliefs fairly extensively, and much of his writing was answering various other objections (including many of those you listed). There does not yet exist a single ten-million-word treatise which provides an end-to-end argument of the level of detail you’re looking for. (There exist the Sequences, which are over a million words, but they while they implicitly answer many of these objections, they’re not structured to be a direct argument to this effect.)
I think it would be much cheaper for you to describe a situation where an ASI develops the kind of nanotech that’d grant it technological self-sufficiency (and the ability to kill all humans), and it remains the case that trading with humans for any longer than it takes to bootstrap that nanotech is cheaper than just doing its own thing, while still being compatible with Eliezer’s model of the world. I have no idea what kind of reasoning or justification you would find compelling as an argument for “cheaper to disassemble”; it seems to require very little additional justification conditioning on that kind of nanotech being realized. My current guess is that you do not think that kind of nanotech is physically realizable by any ASI we are going to develop (including post-RSI), or maybe you think the ASI will be cognitively disadvantaged compared to humans in domains that it thinks are important (in ways that it can’t compensate for, or develop alternatives for, somehow).
To be clear, I am not objecting to the length of his essay. It’s OK to be brief.
I am objecting to the vagueness of the argument. It follows a fairly typical pattern of certain MIRI essays by heavily relying on analogies, debunking straw characters, using metaphors rather than using clear and explicit English, and using stories as arguments, instead of concisely stating the exact premises and implications. I am objecting to the rhetorical flourish, not the word count.
This type of writing may be suitable for persuasion, but it does not seem very suitable for helping people build rigorous models of the world, which I also think is more important when posting on LessWrong.
I think neither of those things, and I entirely reject the argument that AIs will be fundamentally limited in the future in the way you suggested. If you are curious about why I think AIs will plausibly peacefully trade with humans in the future, rather than disassembling humans for their atoms, I would instead point to the facts that:
Trying to disassemble someone for their atoms is typically something the person will try to fight very hard against, if they become aware of your intentions to disassemble them.
Therefore, the cost of attempting to disassemble someone for their atoms does not merely include the technical costs associated with actually disassembling them, but additionally includes: (1) fighting the person who you are trying to kill and disassemble, (2) fighting whatever norms and legal structures are in place to prevent this type of predation against other agents in the world, and (3) the indirect cost of becoming the type of agent who predates on another person in this manner, which could make you an untrustworthy and violent person in the eyes of other agents, including other AIs who might fear you.
The benefit of disassembling a human is quite small, given the abundance of raw materials that substitute almost perfectly for the atoms that you can get from a human.
A rational agent will typically only do something if the benefits of the action outweigh the costs, rather than merely because the costs are small. Even if the costs of disassembling a human (as identified in point (2)) are small, that fact alone does not imply that a rational superintelligent AI would take such an action, precisely because the benefits of that action could be even smaller. And as just stated, we have good reasons to think that the benefits of disassembling a human are quite small in an absolute sense.
Therefore, it seems unlikely, or at least seems non-obvious, that a rational agent—even a very powerful one with access to advanced nanotech—will try to disassemble humans for their atoms.
Nothing in this argument is premised on the idea that AIs will be weak, less intelligent than humans, bounded in their goals, or limited in some other respect, except I suppose to the extent I’m assuming that AIs will be subject to environmental constraints, as opposed to instantly being able to achieve all of their goals at literally zero costs. I think AIs, like all physical beings, will exist in a universe in which they cannot get literally everything they want, and achieve the exact optimum of their utility function without any need to negotiate with anyone else. In other words, even if AIs are very powerful, I still think it may be beneficial for them to compromise with other agents in the world, including the humans, who are comparatively much less powerful than they are.
Responding to bullet 2.
First to 2.1.
The claim at hand, that we have both read Eliezer repeatedly make[1], is that there is a sufficient level of intelligence and a sufficient power of nanotechnology that within days or weeks a system could design and innocuously build a nanotechnology factory out of simple biological materials that goes on to build either a disease or a cellular-sized drones that would quickly cause an extinction event — perhaps a virus that spreads quickly around the world with a replication rate that allows it to spread globally before any symptoms are found, or a series of diamond-based machines that can enter the bloodstream and explode on a coordinated signal. This is such a situation where no response from human civilization would occur, and the argument that an AI ought to be worried about people with guns and bombs coming for its data centers has no relevance.
Now to 2.2 & 2.3.
The above does not rule out a world where such a system has a host of other similarly-capable AIs to negotiate with and has norms of behavior with. But there is no known theory of returns on cognitive investment into intelligence, and so it is not ruled out that pouring 10x funds into a training run with a new architecture improvement won’t give a system abilities to do innovative science and deception on a qualitatively different level to any other AI system present at that time, and be able to initiate a takeover attempt. So it is worth preparing for such a world as, in the absence of a known theory of returns on cognitive investment, the worst case of expected-extinction may well be the default case.
See Point 2 in AGI Ruin: A List of Lethalities for an example of this.
Sure, I have also read Eliezer repeatedly make that claim. On the meta level, I don’t think the fact that he has written about this specific scenario fully makes up for the vagueness in his object-level essay above. But I’m also happy to briefly reply on the object level on this particular narrow point:
In short, I interpret Eliezer to be making a mistake by assuming that the world will not adapt to anticipated developments in nanotechnology and AI in order to protect against various attacks that we can easily see coming, prior to the time that AIs will be capable of accomplishing these incredible feats. By the time AIs are capable of developing such advanced molecular nanotech, I think the world will have already been dramatically transformed by prior waves of technologies, many of which by themselves could importantly change the gameboard, and change what it means for humans to have defenses against advanced nanotech to begin with.
As a concrete example, I think it’s fairly plausible that, by the time artificial superintelligences can create fully functional nanobots that are on-par with or better than biological machines, we will have already developed uploading technology that allows humans to literally become non-biological, implying that we can’t be killed by a virus in the first place. This would reduce the viability of using a virus to cause humanity to go extinct, increasing human robustness.
As a more general argument, and by comparison to Eliezer, I think that nanotechnology will probably be developed more incrementally and predictably, rather than suddenly upon the creation of a superintelligent AI, and the technology will be diffused across civilization, rather than existing solely in the hands of a small lab run by an AI. I also think Eliezer seems to be imagining that superintelligent AI will be created in a world that looks broadly similar to our current world, with defensive technologies that are only roughly as powerful as the ones that exist in 2024. However, I don’t think that will be the case.
Given an incremental and diffuse development trajectory, and transformative precursor technologies to mature nanotech, I expect society will have time to make preparations as the technology is developed, allowing us to develop defenses to such dramatic nanotech attacks alongside the offensive nanotechnologies that will also eventually be developed. It therefore seems unlikely to me that society will be completely caught by surprise by fully-developed-molecular nanotechnology, without any effective defenses.
This picture you describe is coherent. But I don’t read you to be claiming to have an argument or evidence that warrants making the assumption of gradualism (“incrementally and predictably”) in terms of the qualitative rate of capabilities gains from investment into AI systems, especially once the AIs are improving themselves. Because we don’t have any such theory of capability gains, it could well be that this picture is totally wrong and there will be great spikes. Uncertainty over the shape of the curve averages out into the expectation of a smooth curve, but our lack of knowledge about the shape is no argument for the true shape being smooth.
Not that many domains of capability look especially smooth. For instance if one is to count the general domains of knowledge, my very rough picture is that the GPTs went from like 10 to 1,000 to 1,100, in that it basically could not talk coherently and usefully about most subjects, and then it could, and then it could do so a bit better and marginal new domains added slowly. My guess is also that the models our civilization creates will go from “being able to automate very few jobs” to “can suddenly automate 100s of different jobs” in that it will go from not being trustworthy or reliable in many key contexts, and then with a single model or a few models in a row over a couple of years it will be able to do so. The next 10x spike on either such graph is not approached “incrementally and predictably”.
The example Eliezer gives of an AI developing nanotechnology in our current world is an example of a broader category of “ways that takeover is trivial given a sufficiently wide differential in capabilities/intelligence”. There are of course many possibilities for how an adversary with a wide differential in capabilities could have a decisive strategic advantage over humanity. Perhaps an AI will study human psychology and persuasion with far more data and statistical power than anything before and learn how to convince anyone to obey it the way a religious devotee relates to their prophet, or perhaps a system will get access to a whole country’s google docs and personal computers and security recording systems and be able to think about all of this in parallel in a way no state actor is able to, and go on to blackmail a whole string of relevant people in order to get control of a lot of explosives or nuclear weapons and use it to blackmail a country to do its bidding.
I repeat the lack of a theory of capability gains with respect to investment (including AI-assisted investment) means that astronomical differentials may be on-track to surprise us, far more than how GPT-2 and GPT-3 surprised most people in terms of being able to actually write at a human level. The nanotech example is an extreme example of how decisively that can play out.
I think maybe I derailed the conversation by saying “disassemble”, when really “kill” is all that’s required for the argument to go through. I don’t know what sort of fight you are imagining humans having with nanotech that imposes substantial additional costs on the ASI beyond the part where it needs to build & deploy the nanotech that actually does the “killing” part, but in this world I do not expect there to be a fight. I don’t think it requires being able to immediately achieve all of your goals at zero cost in order for it to be cheap for the ASI to do that, conditional on it having developed that technology.
The additional costs of human resistance don’t need to be high in an absolute sense. These costs only need to be higher than the benefit of killing humans, for your argument fail.
It is likewise very easy for the United States to invade and occupy Costa Rica—but that does not imply that it is rational for the United States to do so, because the benefits of invading Costa Rica are presumably even smaller than the costs of taking such an action, even without much unified resistance from Costa Rica.
What matters for the purpose of this argument is the relative magnitude of costs vs. benefits, not the absolute magnitude of the costs. It is insufficient to argue that the costs of killing humans are small. That fact alone does not imply that it is rational to kill humans, from the perspective of an AI. You need to further argue that the benefits of killing humans are even larger to establish the claim that a misaligned AI should rationally kill us.
To the extent your statement that “I don’t expect there to be a fight” means that you don’t think humans can realistically resist in any way that imposes costs on AIs, that’s essentially what I meant to respond to when I talked about the idea of AIs being able to achieve their goals at “zero costs”.
Of course, if you assume that AIs will be able to do whatever they want without any resistance whatsoever from us, then you can of course conclude that they will be able to achieve any goals they want without needing to compromise with us. If killing humans doesn’t cost anything, then yes I agree, the benefits of killing humans, however small, will be higher, and thus it will be rational for AIs to kill humans. I am doubting the claim that the cost of killing humans will be literally zero.
Even if this cost is small, it merely needs to be larger than the benefits of killing humans, for AIs to rationally avoid killing humans.
See Ben’s comment for why the level of nanotech we’re talking about implies a cost of approximately zero.
I would also add: having more energy in the immediate future means more probes send out faster to more distant parts of the galaxy, which may be measured in “additional star systems colonized before they disappear outside the lightcone via universe expansion”. So the benefits are not trivial either.
Yeah ok I weakened my positive statement.
I am a bit confused on point 2. Other than trading or doing it your selfs what other ways are you thinking about getting something?