Including value X in the aggregation is easy: just include a term in the aggregated utility function that depends on the aggregation used in the future. The hard part is maximizing such an aggregated utility function. If Value X takes up enough of the utility function already, an AI maximizing the aggregation might just replace its utility function with Value X and start maximizing that. Otherwise, the AI would probably ignore Value X’s preference to be the only value represented in the aggregation, since complying would cost it more utility elsewhere than it gains. There’s no point to the lottery you suggest, since a lottery between two outcomes cannot have higher utility than either of the outcomes themselves. If Value X is easily satisfied by silly technicalities, the AI could build a different AI with the aggregated utility function, make sure that the other AI becomes more powerfull than it is, and then replace its own utility function with Value X.
I don’t think your Blue Cult example works very well, because for them, the preference for everyone to join the Blue Cult is an instrumental rather than terminal value.
Including value X in the aggregation is easy: just include a term in the aggregated utility function that depends on the aggregation used in the future. The hard part is maximizing such an aggregated utility function. If Value X takes up enough of the utility function already, an AI maximizing the aggregation might just replace its utility function with Value X and start maximizing that. Otherwise, the AI would probably ignore Value X’s preference to be the only value represented in the aggregation, since complying would cost it more utility elsewhere than it gains. There’s no point to the lottery you suggest, since a lottery between two outcomes cannot have higher utility than either of the outcomes themselves. If Value X is easily satisfied by silly technicalities, the AI could build a different AI with the aggregated utility function, make sure that the other AI becomes more powerfull than it is, and then replace its own utility function with Value X.
I don’t think your Blue Cult example works very well, because for them, the preference for everyone to join the Blue Cult is an instrumental rather than terminal value.
Thank you very much for helping me break that down!