Can you address where they go wrong, or, absent that, at least say why you think they are misguided?
As you say, many of these people have written on this at length. So it would be unlikely that someone could give an adequate response in a comment, no matter what the content was.
That said, one basic place where I think Eliezer is mistaken is in thinking that the universe is intrinsically indifferent, and that “good” is basically a description of what people merely happen to desire. That is, of course he does not think that everything a person desires at a particular moment should be called good; he says that “good” refers to a function that takes into account everything a person would want if they considered various things or if they were in various circumstances and so on and so forth. But the function itself, he says, is intrinsically arbitrary: in theory it could have contained pretty much anything, and we would call that good according to the new function (although not according to the old.) The function we have is more valid than others, but only because it is used to evaluate the others; it is not more valid from an independent standpoint.
I don’t know what Bostrom thinks about this, and my guess is that he would be more open to other possibilities. So I’m not suggesting “everyone who cares about AI risk makes this mistake”; but some of them do.
Dan Dennett says something relevant to this, pointing out that often what is impossible in practice is of more theoretical interest than what is “possible in principle,” in some sense of principle. I think this is relevant to whether Eliezer’s moral theory is correct. Regardless of what that function might have been “in principle,” obviously that function is quite limited in practice: for example, it could not possibly have contained “non-existence” as something positively valued for its own sake. No realistic history of the universe could possibly have led to humans possessing that value.
How is all this relevant to AI risk? It seems to me relevant because the belief that good is or is not objective seems relevant to the orthogonality thesis.
I think that the orthogonality thesis is false in practice, even if it is true in “in principle” in some sense, and I think this is a case where Dennett’s idea applies once again: the fact that it is false in practice is the important fact here, and being possible in principle is not really relevant. A certain kind of motte and bailey is sometimes used here as well: it is argued that the orthogonality thesis is true in principle, but then it is assumed that “unless an AI is carefully given human values, it will very likely have non-human ones.” I think this is probably wrong. I think human values are determined in large part by human experiences and human culture. An AI will be created by human beings in a human context, and it will take a great deal of “growing up” before the AI does anything significant. It may be that this process of growing up will take place in a very short period of time, but because it will happen in a human context—that is, it will be learning from human history, human experience, and human culture—its values will largely be human values.
So that this is clear, I am not claiming to have established these things as facts. As I said originally, this is just a comment, and couldn’t be expected to suddenly establish the truth of the matter. I am just pointing to general areas where I think there are problems. The real test of my argument will be whether I win the $1,000 from Yudkowsky.
This is an interesting idea—that an objective measure of “good” exists (i.e. that moral realism is true) and that this fact will prevent an AI’s values from diverging sufficiently far from our own as to be considered unfriendly. It seems to me that the validity of this idea rests on (as least) two assumptions:
That an objective measure of goodness exists
That an AI will discover the objective measure of goodness (or at least a close approximation of it)
Note that it is not enough for the AI to discover the objective measure of goodness; it needs to do this early in its life span prior to taking actions which in the absence of this discovery could be harmful to people (think of a rash adolescent with super-human intelligence).
So, if your idea is correct, I think that it actually underscores the importance of Bostrom’s, EY’s, et al., cautionary message in that it informs the AI community that:
An AGI should be built in such a way that it discovers human (and, hopefully, objective) values from history and culture. I see no reason that we could assume that an AGI would necessarily do this otherwise.
An AGI should be contained (boxed) until it can be verified that it has learned these values (and, it seems that designing such a verification test will require a significant amount of ingenuity)
Bostrom addresses something like your idea (albeit without the assumption of an objective measure of “good”) in Superintelligence under the heading of “Value Learning” in the “Learning Values” chapter.
And, interestingly, EY briefly addressed the idea of moral realism as it relates to the unfriendly AGI argument in a Facebook post. I do not have a link to the actual Facebook post, but user Pangel quoted it here.
The argument is certainly stronger if moral realism is true, but historically it only occurred to me retrospectively that this is involved. That is, it seems to me that I can make a pretty strong argument that the orthogonality thesis will be wrong in practice without assuming (at least explicitly, since it is possible that moral realism is not only true but logically necessary and thus one would have to assume it implicitly for the sake of logical consistency) that moral realism is true.
You are right that either way there would have to be additional steps in the argument. Even if it is given that moral realism is true, or that the orthogonality thesis is not true, it does not immediately follow that the AI risk idea is wrong.
But first let me explain what I mean when I say that the AI risk idea is wrong. Mostly I mean that I do not see any significant danger of destroying the world. It does not mean that “AI cannot possibly do anything harmful.” The latter would be silly itself; it should be at least as possible for AI to do harmful things as for other technologies, and this is a thing that happens. So there is at least as much reason to be careful about what you do with AI, as with other technologies. In that way the argument, “so we should take some precautionary measures,” does not automatically disagree with what I am saying.
You might respond that in that case I don’t disagree significantly with the AI risk idea. But that would not be right. The popular perception at the top of this thread arises almost precisely because of the claim that AI is an existential risk—and it is precisely that claim which I think to be false. There would be no such popular perception if people simply said, correctly, “As with any technology, we should take various precautions as we develop AI.”
I see no reason that we could assume that an AGI would necessarily do this otherwise.
We can distinguish between a thing which is capable of intelligent behavior, like the brain of an infant, and what actually engages in intelligent behavior, like the brain of an older child or of an adult. You can’t, and you don’t, get highly intelligent behavior from the brain of an infant, not even behavior that is highly intelligent from a non-human point of view. In other words, behaving in an actually intelligent way requires massive amounts of information.
When people develop AIs, they will always be judging them from a more or less human point of view, which might amount to something like, “How close is this to being able to pass the Turing Test?” If it is too distant from that, they will tend to modify it to a condition where it is more possible. And this won’t be able to happen without the AI getting a very humanlike formation. That is, that massive amount of information that they need in order to act intelligently, will all be human information, e.g. taken from what is given to it, or from the internet, or whatever. In other words, the reason I think that an AI will discover human values is that it is being raised by humans; the same reason that human infants learn the values that they do.
Again, even if this is right, it does not mean that an AI could never do anything harmful. It simply suggests that the kind of harm it is likely to do, is more like the AI in Ex Machina than something world destroying. That is, it could have sort of human values, but a bit sociopathic, because things are not just exactly right. I’m skeptical that this is a problem anyone can fix in advance, though, just as even now we can’t always prevent humans from learning such a twisted version of human values.
An AGI should be contained (boxed) until it can be verified that it has learned these values
This sounds like someone programs an AI from first principles without knowing what it will do. That is highly unlikely; an AGI will simply be the last version of a program that had many, many previous versions, many of which would have been unboxed simply because we knew they couldn’t do any harm anyway, having subhuman intelligence.
As you say, many of these people have written on this at length. So it would be unlikely that someone could give an adequate response in a comment, no matter what the content was.
That said, one basic place where I think Eliezer is mistaken is in thinking that the universe is intrinsically indifferent, and that “good” is basically a description of what people merely happen to desire. That is, of course he does not think that everything a person desires at a particular moment should be called good; he says that “good” refers to a function that takes into account everything a person would want if they considered various things or if they were in various circumstances and so on and so forth. But the function itself, he says, is intrinsically arbitrary: in theory it could have contained pretty much anything, and we would call that good according to the new function (although not according to the old.) The function we have is more valid than others, but only because it is used to evaluate the others; it is not more valid from an independent standpoint.
I don’t know what Bostrom thinks about this, and my guess is that he would be more open to other possibilities. So I’m not suggesting “everyone who cares about AI risk makes this mistake”; but some of them do.
Dan Dennett says something relevant to this, pointing out that often what is impossible in practice is of more theoretical interest than what is “possible in principle,” in some sense of principle. I think this is relevant to whether Eliezer’s moral theory is correct. Regardless of what that function might have been “in principle,” obviously that function is quite limited in practice: for example, it could not possibly have contained “non-existence” as something positively valued for its own sake. No realistic history of the universe could possibly have led to humans possessing that value.
How is all this relevant to AI risk? It seems to me relevant because the belief that good is or is not objective seems relevant to the orthogonality thesis.
I think that the orthogonality thesis is false in practice, even if it is true in “in principle” in some sense, and I think this is a case where Dennett’s idea applies once again: the fact that it is false in practice is the important fact here, and being possible in principle is not really relevant. A certain kind of motte and bailey is sometimes used here as well: it is argued that the orthogonality thesis is true in principle, but then it is assumed that “unless an AI is carefully given human values, it will very likely have non-human ones.” I think this is probably wrong. I think human values are determined in large part by human experiences and human culture. An AI will be created by human beings in a human context, and it will take a great deal of “growing up” before the AI does anything significant. It may be that this process of growing up will take place in a very short period of time, but because it will happen in a human context—that is, it will be learning from human history, human experience, and human culture—its values will largely be human values.
So that this is clear, I am not claiming to have established these things as facts. As I said originally, this is just a comment, and couldn’t be expected to suddenly establish the truth of the matter. I am just pointing to general areas where I think there are problems. The real test of my argument will be whether I win the $1,000 from Yudkowsky.
This is an interesting idea—that an objective measure of “good” exists (i.e. that moral realism is true) and that this fact will prevent an AI’s values from diverging sufficiently far from our own as to be considered unfriendly. It seems to me that the validity of this idea rests on (as least) two assumptions:
That an objective measure of goodness exists
That an AI will discover the objective measure of goodness (or at least a close approximation of it)
Note that it is not enough for the AI to discover the objective measure of goodness; it needs to do this early in its life span prior to taking actions which in the absence of this discovery could be harmful to people (think of a rash adolescent with super-human intelligence).
So, if your idea is correct, I think that it actually underscores the importance of Bostrom’s, EY’s, et al., cautionary message in that it informs the AI community that:
An AGI should be built in such a way that it discovers human (and, hopefully, objective) values from history and culture. I see no reason that we could assume that an AGI would necessarily do this otherwise.
An AGI should be contained (boxed) until it can be verified that it has learned these values (and, it seems that designing such a verification test will require a significant amount of ingenuity)
Bostrom addresses something like your idea (albeit without the assumption of an objective measure of “good”) in Superintelligence under the heading of “Value Learning” in the “Learning Values” chapter.
And, interestingly, EY briefly addressed the idea of moral realism as it relates to the unfriendly AGI argument in a Facebook post. I do not have a link to the actual Facebook post, but user Pangel quoted it here.
The argument is certainly stronger if moral realism is true, but historically it only occurred to me retrospectively that this is involved. That is, it seems to me that I can make a pretty strong argument that the orthogonality thesis will be wrong in practice without assuming (at least explicitly, since it is possible that moral realism is not only true but logically necessary and thus one would have to assume it implicitly for the sake of logical consistency) that moral realism is true.
You are right that either way there would have to be additional steps in the argument. Even if it is given that moral realism is true, or that the orthogonality thesis is not true, it does not immediately follow that the AI risk idea is wrong.
But first let me explain what I mean when I say that the AI risk idea is wrong. Mostly I mean that I do not see any significant danger of destroying the world. It does not mean that “AI cannot possibly do anything harmful.” The latter would be silly itself; it should be at least as possible for AI to do harmful things as for other technologies, and this is a thing that happens. So there is at least as much reason to be careful about what you do with AI, as with other technologies. In that way the argument, “so we should take some precautionary measures,” does not automatically disagree with what I am saying.
You might respond that in that case I don’t disagree significantly with the AI risk idea. But that would not be right. The popular perception at the top of this thread arises almost precisely because of the claim that AI is an existential risk—and it is precisely that claim which I think to be false. There would be no such popular perception if people simply said, correctly, “As with any technology, we should take various precautions as we develop AI.”
We can distinguish between a thing which is capable of intelligent behavior, like the brain of an infant, and what actually engages in intelligent behavior, like the brain of an older child or of an adult. You can’t, and you don’t, get highly intelligent behavior from the brain of an infant, not even behavior that is highly intelligent from a non-human point of view. In other words, behaving in an actually intelligent way requires massive amounts of information.
When people develop AIs, they will always be judging them from a more or less human point of view, which might amount to something like, “How close is this to being able to pass the Turing Test?” If it is too distant from that, they will tend to modify it to a condition where it is more possible. And this won’t be able to happen without the AI getting a very humanlike formation. That is, that massive amount of information that they need in order to act intelligently, will all be human information, e.g. taken from what is given to it, or from the internet, or whatever. In other words, the reason I think that an AI will discover human values is that it is being raised by humans; the same reason that human infants learn the values that they do.
Again, even if this is right, it does not mean that an AI could never do anything harmful. It simply suggests that the kind of harm it is likely to do, is more like the AI in Ex Machina than something world destroying. That is, it could have sort of human values, but a bit sociopathic, because things are not just exactly right. I’m skeptical that this is a problem anyone can fix in advance, though, just as even now we can’t always prevent humans from learning such a twisted version of human values.
This sounds like someone programs an AI from first principles without knowing what it will do. That is highly unlikely; an AGI will simply be the last version of a program that had many, many previous versions, many of which would have been unboxed simply because we knew they couldn’t do any harm anyway, having subhuman intelligence.