First, suppose one of the utility functions in the set is erroneous, and the AI predicts that in the future, we’ll realize this and create a different AI that optimizes without it. Then the AI will be incentivized to prevent the creation of that AI, or to modify it into including the erroneous value.
The utility functions are normalized so that they all assign 0 to the status quo. The status quo includes humans designing an AI to optimize something. So the minimax agent won’t do anything worse for the values of the later AI than what would happen normally, unless the future AI’s utility function is not in minimax’s ensemble.
The second issue is that, if one of the utility functions is offset so it outputs a score well below the others, the other utility functions will be crowded out in the AI’s attention and resource allocation.
Since they’re normalized to return 0 on the status quo, this won’t quite happen, but it could be that one is a lot harder to increase above 0 than others, and so more resources will go to increasing that one above 0 than the others.
The utility functions are normalized so that they all assign 0 to the status quo. The status quo includes humans designing an AI to optimize something. So the minimax agent won’t do anything worse for the values of the later AI than what would happen normally, unless the future AI’s utility function is not in minimax’s ensemble.
Since they’re normalized to return 0 on the status quo, this won’t quite happen, but it could be that one is a lot harder to increase above 0 than others, and so more resources will go to increasing that one above 0 than the others.