Letting a bunch of AIs with given values resolve their disagreement is not the best way to merge values
[Edited] I agree that it is probably not the best way. Still, the idea of merging values by letting a bunch of AIs with given values resolve their disagreement seems better than previous proposed solutions, and perhaps gives a clue to what the real solution looks like.
BTW, I have a possible solution to the AI-extortion problem mentioned by Eliezer. We can set a lower bound for each delegate’s utility function at the status quo outcome, (N possible worlds with equal probability, each shaped according to one individual’s utility function). Then any threats to cause an “extremely negative” outcome will be ineffective since the “extremely negative” outcome will have utility equal to the status quo outcome.
[Edited] I agree that it is probably not the best way. Still, the idea of merging values by letting a bunch of AIs with given values resolve their disagreement seems better than previous proposed solutions, and perhaps gives a clue to what the real solution looks like.
BTW, I have a possible solution to the AI-extortion problem mentioned by Eliezer. We can set a lower bound for each delegate’s utility function at the status quo outcome, (N possible worlds with equal probability, each shaped according to one individual’s utility function). Then any threats to cause an “extremely negative” outcome will be ineffective since the “extremely negative” outcome will have utility equal to the status quo outcome.