I agree. Note though that the beliefs I propose aren’t actually false. They are just different from what humans believe, but there is no way to verify which of them is correct.
You are right that it could lead to some strange behavior, given the point of view of a human, who has different priors than the AI. However, that is kind of the point of the theory. After all, the plan is to deliberately induce behaviors that are beneficial to humanity.
The question is: After giving an AI strange beliefgs, would the unexpected effects outweigh the planned effects?
I agree. Note though that the beliefs I propose aren’t actually false. They are just different from what humans believe, but there is no way to verify which of them is correct.
You are right that it could lead to some strange behavior, given the point of view of a human, who has different priors than the AI. However, that is kind of the point of the theory. After all, the plan is to deliberately induce behaviors that are beneficial to humanity.
The question is: After giving an AI strange beliefgs, would the unexpected effects outweigh the planned effects?