I wonder how hard it would be to self-modify prior to the imposition of the sort of regime discussed here to be a counter-factual utility monster (along the lines of “I prefer X if Z and prefer not-X if not-Z”) who very very much wants to be (and thus becomes?) an actual utility monster iff being a utility monster is rewarded. If this turns out to be easy then it seems like the odds of this already having happened in secret before the imposition of the utility-monster-rewarding-regime would need to be taken into account by those contemplating the imposition.
It would be ironic if the regime was launched, and in the course of surveying preferences at its outset they discovered the counter-factual utility monster’s “moral booby-trap” and became its hostages. Story idea! Someone launches a simple preference aggregation regime and they discover a moral booby-trap and are horrified at what is likely to happen when the survey ends and the regime gets down to business… then they discover a second counter-factual utility monster booby trap lurking in someone’s head that was designed with the naive booby traps in mind and so thwarts it. The second monster also manages to have room in their function to grant “utility monster empathy sops” to the launchers of the regime and they are overjoyed that someone managed to save them from their own initial hubris, even though they would have been horrified if they had only discovered the non-naive monster with no naive monster to serve as a contrast object. Utility for everyone but the naive monster: happy ending!
Linearly combining utility functions does not force you to reward utility monsters. It just forces you to either be willing to sacrifice large amounts of others’ utility for extremely large amounts of utility monster utility, or be unwilling to sacrifice small amounts of others’ utility for somewhat large amounts of utility monster utility in the same ratio. The normalization scheme could require the range of all normalized utility functions to fit within certain bounds.
I wonder how hard it would be to self-modify prior to the imposition of the sort of regime discussed here to be a counter-factual utility monster (along the lines of “I prefer X if Z and prefer not-X if not-Z”) who very very much wants to be (and thus becomes?) an actual utility monster iff being a utility monster is rewarded. If this turns out to be easy then it seems like the odds of this already having happened in secret before the imposition of the utility-monster-rewarding-regime would need to be taken into account by those contemplating the imposition.
It would be ironic if the regime was launched, and in the course of surveying preferences at its outset they discovered the counter-factual utility monster’s “moral booby-trap” and became its hostages. Story idea! Someone launches a simple preference aggregation regime and they discover a moral booby-trap and are horrified at what is likely to happen when the survey ends and the regime gets down to business… then they discover a second counter-factual utility monster booby trap lurking in someone’s head that was designed with the naive booby traps in mind and so thwarts it. The second monster also manages to have room in their function to grant “utility monster empathy sops” to the launchers of the regime and they are overjoyed that someone managed to save them from their own initial hubris, even though they would have been horrified if they had only discovered the non-naive monster with no naive monster to serve as a contrast object. Utility for everyone but the naive monster: happy ending!
Linearly combining utility functions does not force you to reward utility monsters. It just forces you to either be willing to sacrifice large amounts of others’ utility for extremely large amounts of utility monster utility, or be unwilling to sacrifice small amounts of others’ utility for somewhat large amounts of utility monster utility in the same ratio. The normalization scheme could require the range of all normalized utility functions to fit within certain bounds.