But many people don’t like this, usually for reasons involving utility monsters. If you are one of these people, then you better learn to like it, because according to Harsanyi’s Social Aggregation Theorem, any alternative can result in the supposedly Friendly AI making a choice that is bad for every member of the population. More formally,
That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy. Will it happen? What’s the likelihood that it happens? What’s the cost if it does happen?
The two alternatives discussed each has their own failure mode, while your “better learn to like it” admonition seems to imply that one side is compelled by the failure mode of their preferred strategy to give it up for the alternative strategy.
Why is this new failure mode supposed to be decisive in the choice between the two alternatives?
That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy.
It’s possible that the AI would just happen never to confront a situation where it would choose differently than everyone else would, but not reliably. If you had an AI that violated axiom 2, it would be tempting to modify it to include the special case “If X is the best option in expectation for every morally relevant agent, then do X.” It seems hard to argue that such a modification would not be an improvement. And yet only throwing in that special case would make it no longer VNM-rational. Worse than a VNM-irrational agent is pretty bad.
Why is this new failure mode supposed to be decisive in the choice between the two alternatives?
Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes. None that I’ve heard of anyway, and I’d be pretty shocked if you came up with a failure mode that did compete.
Utility monster isn’t a failure mode. It just messes with our intuitions because no one could imagine being a utility monster.
Edit: At the time I made this comment, the wikipedia article on utility monsters incorrectly stated that a utility monster meant an agent that gets increasing marginal utility with respect to resources. Now that I know that a utility monster means an agent that gets much more utility from resources than other agents do, my response is that you can multiply the utility monster’s utility function by a small coefficient, so that it no longer acts as a utility monster.
That a bad result can happen in a given strategy is not a conclusive argument against preferring that strategy. Will it happen? What’s the likelihood that it happens? What’s the cost if it does happen?
The two alternatives discussed each has their own failure mode, while your “better learn to like it” admonition seems to imply that one side is compelled by the failure mode of their preferred strategy to give it up for the alternative strategy.
Why is this new failure mode supposed to be decisive in the choice between the two alternatives?
It’s possible that the AI would just happen never to confront a situation where it would choose differently than everyone else would, but not reliably. If you had an AI that violated axiom 2, it would be tempting to modify it to include the special case “If X is the best option in expectation for every morally relevant agent, then do X.” It seems hard to argue that such a modification would not be an improvement. And yet only throwing in that special case would make it no longer VNM-rational. Worse than a VNM-irrational agent is pretty bad.
Because maximizing a weighted sum of utility functions does not have any comparably convincing failure modes. None that I’ve heard of anyway, and I’d be pretty shocked if you came up with a failure mode that did compete.
You don’t think utility monster is a comparably convincing failure mode?
I think we just don’t have data one way or the other.
Utility monster isn’t a failure mode. It just messes with our intuitions because no one could imagine being a utility monster.
Edit: At the time I made this comment, the wikipedia article on utility monsters incorrectly stated that a utility monster meant an agent that gets increasing marginal utility with respect to resources. Now that I know that a utility monster means an agent that gets much more utility from resources than other agents do, my response is that you can multiply the utility monster’s utility function by a small coefficient, so that it no longer acts as a utility monster.