That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy.
It’s possible that the AI would just happen never to confront a situation where it would choose differently than everyone else would, but not reliably. If you had an AI that violated axiom 2, it would be tempting to modify it to include the special case “If X is the best option in expectation for every morally relevant agent, then do X.” It seems hard to argue that such a modification would not be an improvement. And yet only throwing in that special case would make it no longer VNM-rational. Worse than a VNM-irrational agent is pretty bad.
Why is this new failure mode supposed to be decisive in the choice between the two alternatives?
Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes. None that I’ve heard of anyway, and I’d be pretty shocked if you came up with a failure mode that did compete.
Utility monster isn’t a failure mode. It just messes with our intuitions because no one could imagine being a utility monster.
Edit: At the time I made this comment, the wikipedia article on utility monsters incorrectly stated that a utility monster meant an agent that gets increasing marginal utility with respect to resources. Now that I know that a utility monster means an agent that gets much more utility from resources than other agents do, my response is that you can multiply the utility monster’s utility function by a small coefficient, so that it no longer acts as a utility monster.
It’s possible that the AI would just happen never to confront a situation where it would choose differently than everyone else would, but not reliably. If you had an AI that violated axiom 2, it would be tempting to modify it to include the special case “If X is the best option in expectation for every morally relevant agent, then do X.” It seems hard to argue that such a modification would not be an improvement. And yet only throwing in that special case would make it no longer VNM-rational. Worse than a VNM-irrational agent is pretty bad.
Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes. None that I’ve heard of anyway, and I’d be pretty shocked if you came up with a failure mode that did compete.
You don’t think utility monster is a comparably convincing failure mode?
I think we just don’t have data one way or the other.
Utility monster isn’t a failure mode. It just messes with our intuitions because no one could imagine being a utility monster.
Edit: At the time I made this comment, the wikipedia article on utility monsters incorrectly stated that a utility monster meant an agent that gets increasing marginal utility with respect to resources. Now that I know that a utility monster means an agent that gets much more utility from resources than other agents do, my response is that you can multiply the utility monster’s utility function by a small coefficient, so that it no longer acts as a utility monster.