I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it’s not particularly useful to it to kill all humans.
Paperclip-machine is a specific kind of alignment failure; I don’t think it’s hard to generate utility functions orthogonal to human concerns that don’t actually require the destruction of humanity to implement.
The scenario I’ve been thinking the most about lately, is an A.I. that learns how to “wirehead itself” by spoofing its own reward function during training, and whose goal is just to continue to do that indefinitely. But more generally, the “you are made of atoms and these atoms could be used for something else” cliché is based on an assumption that the misaligned A.I.’s faulty utility function is going to involve maximizing number of atoms arranged in a particular way, which I don’t think is obvious at all. Very possible, don’t get me wrong, but not a given.
Of course, even an A.I. with no “primary” interest in altering the outside world is still dangerous, because if it estimates that we might try to turn it off, it might expend energy now on acting in the real world to secure its valuable self-wireheading peace later. But that whole “it doesn’t want us to notice it’s useless and press the off-button” class of A.I.-decides-to-destroy-humanity scenarios is predicated on us having the ability to turn off the A.I. in the first place.
(I don’t think I need to elaborate on the fact that there are a lot of ways for a superintelligence to ensure its continued existence other than planetary genocide — after all, it’s already a premise of most A.I. doom discussion that we couldn’t turn an A.I. off again even if we do notice it’s going “wrong”.)
I agree the point as presented by OP is weak, but I think there is a stronger version of this argument to be made. I feel like there are a lot of world-states where A.I. is badly-aligned but non-murderous simply because it’s not particularly useful to it to kill all humans.
Paperclip-machine is a specific kind of alignment failure; I don’t think it’s hard to generate utility functions orthogonal to human concerns that don’t actually require the destruction of humanity to implement.
The scenario I’ve been thinking the most about lately, is an A.I. that learns how to “wirehead itself” by spoofing its own reward function during training, and whose goal is just to continue to do that indefinitely. But more generally, the “you are made of atoms and these atoms could be used for something else” cliché is based on an assumption that the misaligned A.I.’s faulty utility function is going to involve maximizing number of atoms arranged in a particular way, which I don’t think is obvious at all. Very possible, don’t get me wrong, but not a given.
Of course, even an A.I. with no “primary” interest in altering the outside world is still dangerous, because if it estimates that we might try to turn it off, it might expend energy now on acting in the real world to secure its valuable self-wireheading peace later. But that whole “it doesn’t want us to notice it’s useless and press the off-button” class of A.I.-decides-to-destroy-humanity scenarios is predicated on us having the ability to turn off the A.I. in the first place.
(I don’t think I need to elaborate on the fact that there are a lot of ways for a superintelligence to ensure its continued existence other than planetary genocide — after all, it’s already a premise of most A.I. doom discussion that we couldn’t turn an A.I. off again even if we do notice it’s going “wrong”.)