AGI will only be Friendly if its goals are the kinds of goals that we would want it to have
At the risk of losing my precious karma, I’ll play the devil’s advocate and say I disagree.
First some definitions: “Friendly” (AI), according to Wikipedia, is one that is beneficial to humanity (not a human buddy or pet). “General” in AGI means not problem-specific (narrow AI).
My counterexample is an AI system that lacks any motivations, goals or actuators. Think of an AIXI system (or, realistically, a system that approximates it), and subtract any reward mechanisms. It just models its world (looking for short programs that describe its input). You could use it to make (super-intelligent) predictions about the future. This seems clearly beneficial to humanity (until it falls into malicious human hands, but that’s besides the argument you are making)
That would make (human[s] + predictor) in to an optimization process that was powerful beyond the human[s]’s ability to steer. You might see a nice looking prediction, but you won’t understand the value of the details, or the value of the means used to achieve it. (Which would be called trade-offs in a goal directed mind, but nothing weighs them here.)
It also won’t be reliable to look for models in which you are predicted to not hit the Emergency Regret Button As that may just find models in which your regret evaluator is modified.
At the risk of losing my precious karma, I’ll play the devil’s advocate and say I disagree.
First some definitions: “Friendly” (AI), according to Wikipedia, is one that is beneficial to humanity (not a human buddy or pet). “General” in AGI means not problem-specific (narrow AI).
My counterexample is an AI system that lacks any motivations, goals or actuators. Think of an AIXI system (or, realistically, a system that approximates it), and subtract any reward mechanisms. It just models its world (looking for short programs that describe its input). You could use it to make (super-intelligent) predictions about the future. This seems clearly beneficial to humanity (until it falls into malicious human hands, but that’s besides the argument you are making)
See Dreams of Friendliness as well as this comment.
I think you’ve got a good point, and folks have been voted up for saying the same thing in the past...
That would make (human[s] + predictor) in to an optimization process that was powerful beyond the human[s]’s ability to steer. You might see a nice looking prediction, but you won’t understand the value of the details, or the value of the means used to achieve it. (Which would be called trade-offs in a goal directed mind, but nothing weighs them here.)
It also won’t be reliable to look for models in which you are predicted to not hit the Emergency Regret Button As that may just find models in which your regret evaluator is modified.
Is a human equipped with Google an optimization process powerful beyond the human’s ability to steer?
Tell me from China.