Yes.
To me it seems like all arguments for the importance of friendly AI are based on the assumption that its moral evaluation function must be correct, or it will necessarily become evil or insane, due over optimization of some weird aspect.
However, with uncertainty in the system, as limited knowledge of the past, or as uncertainty in what the evaluation function is, optimization should take this into account, and make strategies to keep its options open. In the paperclip example, this would be avoiding making people into paperclips because it suspects that the paperclips might be for people.
Mathematically, an AI going evil insane corresponds to it seeking the most probable optimization, while doing multiple strategies corresponds to it integrating the probabilities over different outcomes.
I think the usual example assumes that the machine assigns a low probability to the hypothesis that paperclips are not the only valuable thing—because of how it was programmed.
Yes. To me it seems like all arguments for the importance of friendly AI are based on the assumption that its moral evaluation function must be correct, or it will necessarily become evil or insane, due over optimization of some weird aspect.
However, with uncertainty in the system, as limited knowledge of the past, or as uncertainty in what the evaluation function is, optimization should take this into account, and make strategies to keep its options open. In the paperclip example, this would be avoiding making people into paperclips because it suspects that the paperclips might be for people.
Mathematically, an AI going evil insane corresponds to it seeking the most probable optimization, while doing multiple strategies corresponds to it integrating the probabilities over different outcomes.
I think the usual example assumes that the machine assigns a low probability to the hypothesis that paperclips are not the only valuable thing—because of how it was programmed.