This is an interesting idea—that an objective measure of “good” exists (i.e. that moral realism is true) and that this fact will prevent an AI’s values from diverging sufficiently far from our own as to be considered unfriendly. It seems to me that the validity of this idea rests on (as least) two assumptions:
That an objective measure of goodness exists
That an AI will discover the objective measure of goodness (or at least a close approximation of it)
Note that it is not enough for the AI to discover the objective measure of goodness; it needs to do this early in its life span prior to taking actions which in the absence of this discovery could be harmful to people (think of a rash adolescent with super-human intelligence).
So, if your idea is correct, I think that it actually underscores the importance of Bostrom’s, EY’s, et al., cautionary message in that it informs the AI community that:
An AGI should be built in such a way that it discovers human (and, hopefully, objective) values from history and culture. I see no reason that we could assume that an AGI would necessarily do this otherwise.
An AGI should be contained (boxed) until it can be verified that it has learned these values (and, it seems that designing such a verification test will require a significant amount of ingenuity)
Bostrom addresses something like your idea (albeit without the assumption of an objective measure of “good”) in Superintelligence under the heading of “Value Learning” in the “Learning Values” chapter.
And, interestingly, EY briefly addressed the idea of moral realism as it relates to the unfriendly AGI argument in a Facebook post. I do not have a link to the actual Facebook post, but user Pangel quoted it here.
The argument is certainly stronger if moral realism is true, but historically it only occurred to me retrospectively that this is involved. That is, it seems to me that I can make a pretty strong argument that the orthogonality thesis will be wrong in practice without assuming (at least explicitly, since it is possible that moral realism is not only true but logically necessary and thus one would have to assume it implicitly for the sake of logical consistency) that moral realism is true.
You are right that either way there would have to be additional steps in the argument. Even if it is given that moral realism is true, or that the orthogonality thesis is not true, it does not immediately follow that the AI risk idea is wrong.
But first let me explain what I mean when I say that the AI risk idea is wrong. Mostly I mean that I do not see any significant danger of destroying the world. It does not mean that “AI cannot possibly do anything harmful.” The latter would be silly itself; it should be at least as possible for AI to do harmful things as for other technologies, and this is a thing that happens. So there is at least as much reason to be careful about what you do with AI, as with other technologies. In that way the argument, “so we should take some precautionary measures,” does not automatically disagree with what I am saying.
You might respond that in that case I don’t disagree significantly with the AI risk idea. But that would not be right. The popular perception at the top of this thread arises almost precisely because of the claim that AI is an existential risk—and it is precisely that claim which I think to be false. There would be no such popular perception if people simply said, correctly, “As with any technology, we should take various precautions as we develop AI.”
I see no reason that we could assume that an AGI would necessarily do this otherwise.
We can distinguish between a thing which is capable of intelligent behavior, like the brain of an infant, and what actually engages in intelligent behavior, like the brain of an older child or of an adult. You can’t, and you don’t, get highly intelligent behavior from the brain of an infant, not even behavior that is highly intelligent from a non-human point of view. In other words, behaving in an actually intelligent way requires massive amounts of information.
When people develop AIs, they will always be judging them from a more or less human point of view, which might amount to something like, “How close is this to being able to pass the Turing Test?” If it is too distant from that, they will tend to modify it to a condition where it is more possible. And this won’t be able to happen without the AI getting a very humanlike formation. That is, that massive amount of information that they need in order to act intelligently, will all be human information, e.g. taken from what is given to it, or from the internet, or whatever. In other words, the reason I think that an AI will discover human values is that it is being raised by humans; the same reason that human infants learn the values that they do.
Again, even if this is right, it does not mean that an AI could never do anything harmful. It simply suggests that the kind of harm it is likely to do, is more like the AI in Ex Machina than something world destroying. That is, it could have sort of human values, but a bit sociopathic, because things are not just exactly right. I’m skeptical that this is a problem anyone can fix in advance, though, just as even now we can’t always prevent humans from learning such a twisted version of human values.
An AGI should be contained (boxed) until it can be verified that it has learned these values
This sounds like someone programs an AI from first principles without knowing what it will do. That is highly unlikely; an AGI will simply be the last version of a program that had many, many previous versions, many of which would have been unboxed simply because we knew they couldn’t do any harm anyway, having subhuman intelligence.
This is an interesting idea—that an objective measure of “good” exists (i.e. that moral realism is true) and that this fact will prevent an AI’s values from diverging sufficiently far from our own as to be considered unfriendly. It seems to me that the validity of this idea rests on (as least) two assumptions:
That an objective measure of goodness exists
That an AI will discover the objective measure of goodness (or at least a close approximation of it)
Note that it is not enough for the AI to discover the objective measure of goodness; it needs to do this early in its life span prior to taking actions which in the absence of this discovery could be harmful to people (think of a rash adolescent with super-human intelligence).
So, if your idea is correct, I think that it actually underscores the importance of Bostrom’s, EY’s, et al., cautionary message in that it informs the AI community that:
An AGI should be built in such a way that it discovers human (and, hopefully, objective) values from history and culture. I see no reason that we could assume that an AGI would necessarily do this otherwise.
An AGI should be contained (boxed) until it can be verified that it has learned these values (and, it seems that designing such a verification test will require a significant amount of ingenuity)
Bostrom addresses something like your idea (albeit without the assumption of an objective measure of “good”) in Superintelligence under the heading of “Value Learning” in the “Learning Values” chapter.
And, interestingly, EY briefly addressed the idea of moral realism as it relates to the unfriendly AGI argument in a Facebook post. I do not have a link to the actual Facebook post, but user Pangel quoted it here.
The argument is certainly stronger if moral realism is true, but historically it only occurred to me retrospectively that this is involved. That is, it seems to me that I can make a pretty strong argument that the orthogonality thesis will be wrong in practice without assuming (at least explicitly, since it is possible that moral realism is not only true but logically necessary and thus one would have to assume it implicitly for the sake of logical consistency) that moral realism is true.
You are right that either way there would have to be additional steps in the argument. Even if it is given that moral realism is true, or that the orthogonality thesis is not true, it does not immediately follow that the AI risk idea is wrong.
But first let me explain what I mean when I say that the AI risk idea is wrong. Mostly I mean that I do not see any significant danger of destroying the world. It does not mean that “AI cannot possibly do anything harmful.” The latter would be silly itself; it should be at least as possible for AI to do harmful things as for other technologies, and this is a thing that happens. So there is at least as much reason to be careful about what you do with AI, as with other technologies. In that way the argument, “so we should take some precautionary measures,” does not automatically disagree with what I am saying.
You might respond that in that case I don’t disagree significantly with the AI risk idea. But that would not be right. The popular perception at the top of this thread arises almost precisely because of the claim that AI is an existential risk—and it is precisely that claim which I think to be false. There would be no such popular perception if people simply said, correctly, “As with any technology, we should take various precautions as we develop AI.”
We can distinguish between a thing which is capable of intelligent behavior, like the brain of an infant, and what actually engages in intelligent behavior, like the brain of an older child or of an adult. You can’t, and you don’t, get highly intelligent behavior from the brain of an infant, not even behavior that is highly intelligent from a non-human point of view. In other words, behaving in an actually intelligent way requires massive amounts of information.
When people develop AIs, they will always be judging them from a more or less human point of view, which might amount to something like, “How close is this to being able to pass the Turing Test?” If it is too distant from that, they will tend to modify it to a condition where it is more possible. And this won’t be able to happen without the AI getting a very humanlike formation. That is, that massive amount of information that they need in order to act intelligently, will all be human information, e.g. taken from what is given to it, or from the internet, or whatever. In other words, the reason I think that an AI will discover human values is that it is being raised by humans; the same reason that human infants learn the values that they do.
Again, even if this is right, it does not mean that an AI could never do anything harmful. It simply suggests that the kind of harm it is likely to do, is more like the AI in Ex Machina than something world destroying. That is, it could have sort of human values, but a bit sociopathic, because things are not just exactly right. I’m skeptical that this is a problem anyone can fix in advance, though, just as even now we can’t always prevent humans from learning such a twisted version of human values.
This sounds like someone programs an AI from first principles without knowing what it will do. That is highly unlikely; an AGI will simply be the last version of a program that had many, many previous versions, many of which would have been unboxed simply because we knew they couldn’t do any harm anyway, having subhuman intelligence.