If there is no good reason for an AI to be friendly (a belief which is plausible, but that I’ve never seen proven, and which is implied by the assumption that unfriendly AI is vastly more likely), then what’s left but hand-coded goals?
Unfriendly AI is only vastly more plausible if you’re not doing it right. Out of the space of all possible preferences, human friendly preferences are a tiny sliver. If you picked at random you would surely get something as bad as a paperclipper.
As optimizers, we can try to aim at the space of human friendly preferences, but we’re stupid optimizers in this domain and compared to the complexity of this problem. A program could better target this space, and we are much much more likely to be smart enough to write that program, than to survive the success of an AI based on hand coded goals and killswitches.
This is like going to the moon: Let the computer steer.
Unfriendly AI is only vastly more plausible if you’re not doing it right. Out of the space of all possible preferences, human friendly preferences are a tiny sliver. If you picked at random you would surely get something as bad as a paperclipper.
As optimizers, we can try to aim at the space of human friendly preferences, but we’re stupid optimizers in this domain and compared to the complexity of this problem. A program could better target this space, and we are much much more likely to be smart enough to write that program, than to survive the success of an AI based on hand coded goals and killswitches.
This is like going to the moon: Let the computer steer.