“The truly fast way to produce a human-relative ideal moral agent is to create an AI with the interim goal of inferring the “human utility function” (but with a few safeguards built in, so it doesn’t, e.g., kill off humanity while it solves that sub-problem),”
That is three-laws-of-robotics-ism, and it won’t work. There’s no such thing as a safe superintelligince that doesn’t already share our values.
Surely there can be such super-intelligences: Imagine a (perhaps autistic) IQ-200 guy who just wants to stay in his room and play with his paperclips. He doesn’t really care about the rest of the world, he doesn’t care about extending his intelligence further, and the rest of the world doesn’t quite care about his paperclips. Now replace the guy with an AI with the same values: it’s quite super-intelligent already, but it’s still safe (in the sense that objectively it poses no threat, other than the fact that the resources it uses playing with its paperclips could be used for something else); I have no problem scaling its intelligence much further and leaving it just as benign.
Of course, once it’s super-intelligent (quite a bit earlier, in fact), it may be very hard or impossible for us to determine that it’s safe — but then again, the same is true for humans, and quite a few of the billions of existing and past humans are or have been very dangerous.
The difference between “X can’t be safe” and “X can’t be determined to be safe” is important; the first means “probability we live, given X, is zero”, and the other means “probability we live, given X, is strictly less than one”.
“The truly fast way to produce a human-relative ideal moral agent is to create an AI with the interim goal of inferring the “human utility function” (but with a few safeguards built in, so it doesn’t, e.g., kill off humanity while it solves that sub-problem),”
That is three-laws-of-robotics-ism, and it won’t work. There’s no such thing as a safe superintelligince that doesn’t already share our values.
Surely there can be such super-intelligences: Imagine a (perhaps autistic) IQ-200 guy who just wants to stay in his room and play with his paperclips. He doesn’t really care about the rest of the world, he doesn’t care about extending his intelligence further, and the rest of the world doesn’t quite care about his paperclips. Now replace the guy with an AI with the same values: it’s quite super-intelligent already, but it’s still safe (in the sense that objectively it poses no threat, other than the fact that the resources it uses playing with its paperclips could be used for something else); I have no problem scaling its intelligence much further and leaving it just as benign.
Of course, once it’s super-intelligent (quite a bit earlier, in fact), it may be very hard or impossible for us to determine that it’s safe — but then again, the same is true for humans, and quite a few of the billions of existing and past humans are or have been very dangerous.
The difference between “X can’t be safe” and “X can’t be determined to be safe” is important; the first means “probability we live, given X, is zero”, and the other means “probability we live, given X, is strictly less than one”.