I think I see the point, but I’m not convinced it’s actually feasible, because it requires a precommitment guaranteed by everyone. Yet for humans, it seems intuitive that the winner (if she doesn’t create paperclips) will use the power right away, which goes against the precommitment, and thus the precommitment is ineffective.
Does this objection makes sense to you, or do you think I am confused by your proposal?
Indeed players might follow a different strategy than they declare. A player can only verify another player’s precommitment after pressing the button (or through old-fashioned espionage of their button setup). But I find it reasonable to expect that a player, seeing the shape of the AI race and what is needed to prevent mutual destruction, would actually design their AGI to use a decision theory that would follow through on the precommitment. Humans may not be intuitively compelled by weird decision theories, but they can expect someone to write an AGI that uses them. Although even a human may find giving other players what they deserve more important than not letting the world as we know it continue for another decade.
Compare to Dr. Strangelove’s doomsday machine. We expect that a human in the loop would not follow through, but we can’t expect that no human would build such a machine.
I think I see the point, but I’m not convinced it’s actually feasible, because it requires a precommitment guaranteed by everyone. Yet for humans, it seems intuitive that the winner (if she doesn’t create paperclips) will use the power right away, which goes against the precommitment, and thus the precommitment is ineffective.
Does this objection makes sense to you, or do you think I am confused by your proposal?
Indeed players might follow a different strategy than they declare. A player can only verify another player’s precommitment after pressing the button (or through old-fashioned espionage of their button setup). But I find it reasonable to expect that a player, seeing the shape of the AI race and what is needed to prevent mutual destruction, would actually design their AGI to use a decision theory that would follow through on the precommitment. Humans may not be intuitively compelled by weird decision theories, but they can expect someone to write an AGI that uses them. Although even a human may find giving other players what they deserve more important than not letting the world as we know it continue for another decade.
Compare to Dr. Strangelove’s doomsday machine. We expect that a human in the loop would not follow through, but we can’t expect that no human would build such a machine.