A decision theory that doesn’t need to go through the motions of making a commitment outside the cognitive algorithm is superior. Act as if you have made a commitment in all the situations where you benefit from having made the commitment. Actually make commitment only if it’s necessary to signal the resulting decision.
(Off-point:) The protagonist may well be rational about sacrificing his life, if he cares about stopping the antagonist’s plan more.
I believe I agree with the intuition. Does it say anything about a problem like the above, though? Does the villain decide not to poison the hero, because the hero would not open the box even if the villain decided to poison the hero? Or does the hero decide to open the box, because the villain would poison the hero even if the hero decided not to open the box? Is there a symmetry-breaker here? -- Do we get a mixed strategy à la the Nash equilibrium for Rock-Paper-Scissors, where each player makes each choice with 50% probability?
(I’m assuming we’re assuming the preference orderings are: The hero prefers no poison to opening the box to dying; the villain prefers the box opened to no poison to the hero dying [because the latter would be a waste of perfectly good poison].)
I’m not sure why I’m getting downmodded into oblivion here. I’ll go out on a limb and assume that I was being incomprehensible, even though I’ll be digging myself in deeper if that wasn’t the reason...
In classical game theory (subgame-perfect equilibrium), if you eat my chocolate, it is not rational for me to tweak your nose in retaliation at cost to myself. But if I can first commit myself to tweaking your nose if you eat my chocolate, it is no longer rational for you to eat it. But, if you can even earlier commit to definitely eating my chocolate even if I commit to then tweaking your nose, it is (still in classical game theory) no longer rational for me to commit to tweaking your nose! The early committer gets the good stuff.
Eliezer’s arguments have convinced me that a better decision theory would work like Vladimir says, acting as if you had made a commitment in all situations where you would like to make a commitment. But as far as I can see, both the nose-tweaker and the chocolate-eater can do that—speaking in intuitive human terms, it comes down to who is more stubborn. So what does happen? Is there a symmetry breaker? Can it happen that you commit to eating my chocolate, I commit to tweaking your nose, and we end up in the worst possible world for both of us? (Well, I’m pretty confident that that’s not what Eliezer’s theory (not shown) would do.)
Borrowing from classical game theory, perhaps we say that one of the two commitment scenarios happens, but we can’t say which (1. you eat my chocolate and I don’t tweak your nose; 2. you don’t eat my chocolate, which is a good thing because I would tweak your nose if you did). In the simple commitment game we’re considering here, this amounts to considering all Nash equilibria instead of only subgame perfect equilibria (Nash = “no player can do better by changing their strategy”—but I’m allowed to counterfactually tweak your nose at cost to myself if we don’t actually reach that part of the game tree at equilibrium). But of course, if you accept Eliezer’s arguments, Nash equilibrium is wrong in general, and in any case, it’s not obvious to me if “either of the two scenarios can happen” is the right solution to this game.
To make the implicit motivation behind these two comments explicit: I’m worried that there’s a danger of writing “the rightful owner will keep their chocolate” on the bottom line, noticing that a proper decision theory would allow them to retaliate, and saying “done!” without even considering whether the same logic allows the nefarious villain to spitefully commit to eating the chocolate anyhow. If the theory says that either of the two commitment outcomes may happen, ok, but I think it deserves mention. And if the theory says is something else, I want to know that too. :-)
You can’t argue with a rock, so you can’t stop a rock-solid commitment, even with your own rock-solid commitment. But you can solve the game given the commitments, with the outcome for each side. If this outcome is inferior to other possible commitments, then those other commitments should be used instead.
So, if the hero expects that his commitment to die will still result in villain making him die, this commitment is not a good idea and shouldn’t be made (for example, maybe the villain just wants to play the game). The tricky part is that if the hero expected his commitment to stop the villain, he still needs to dutifully die once the villain surprised him, to the extent this would be necessary to communicate the commitment to the villain prior to his decision, since it’s precisely this communicated model of behavior that was supposed to stop him.
A decision theory that doesn’t need to go through the motions of making a commitment outside the cognitive algorithm is superior. Act as if you have made a commitment in all the situations where you benefit from having made the commitment. Actually make commitment only if it’s necessary to signal the resulting decision.
(Off-point:) The protagonist may well be rational about sacrificing his life, if he cares about stopping the antagonist’s plan more.
I believe I agree with the intuition. Does it say anything about a problem like the above, though? Does the villain decide not to poison the hero, because the hero would not open the box even if the villain decided to poison the hero? Or does the hero decide to open the box, because the villain would poison the hero even if the hero decided not to open the box? Is there a symmetry-breaker here? -- Do we get a mixed strategy à la the Nash equilibrium for Rock-Paper-Scissors, where each player makes each choice with 50% probability?
(I’m assuming we’re assuming the preference orderings are: The hero prefers no poison to opening the box to dying; the villain prefers the box opened to no poison to the hero dying [because the latter would be a waste of perfectly good poison].)
I’m not sure why I’m getting downmodded into oblivion here. I’ll go out on a limb and assume that I was being incomprehensible, even though I’ll be digging myself in deeper if that wasn’t the reason...
In classical game theory (subgame-perfect equilibrium), if you eat my chocolate, it is not rational for me to tweak your nose in retaliation at cost to myself. But if I can first commit myself to tweaking your nose if you eat my chocolate, it is no longer rational for you to eat it. But, if you can even earlier commit to definitely eating my chocolate even if I commit to then tweaking your nose, it is (still in classical game theory) no longer rational for me to commit to tweaking your nose! The early committer gets the good stuff.
Eliezer’s arguments have convinced me that a better decision theory would work like Vladimir says, acting as if you had made a commitment in all situations where you would like to make a commitment. But as far as I can see, both the nose-tweaker and the chocolate-eater can do that—speaking in intuitive human terms, it comes down to who is more stubborn. So what does happen? Is there a symmetry breaker? Can it happen that you commit to eating my chocolate, I commit to tweaking your nose, and we end up in the worst possible world for both of us? (Well, I’m pretty confident that that’s not what Eliezer’s theory (not shown) would do.)
Borrowing from classical game theory, perhaps we say that one of the two commitment scenarios happens, but we can’t say which (1. you eat my chocolate and I don’t tweak your nose; 2. you don’t eat my chocolate, which is a good thing because I would tweak your nose if you did). In the simple commitment game we’re considering here, this amounts to considering all Nash equilibria instead of only subgame perfect equilibria (Nash = “no player can do better by changing their strategy”—but I’m allowed to counterfactually tweak your nose at cost to myself if we don’t actually reach that part of the game tree at equilibrium). But of course, if you accept Eliezer’s arguments, Nash equilibrium is wrong in general, and in any case, it’s not obvious to me if “either of the two scenarios can happen” is the right solution to this game.
To make the implicit motivation behind these two comments explicit: I’m worried that there’s a danger of writing “the rightful owner will keep their chocolate” on the bottom line, noticing that a proper decision theory would allow them to retaliate, and saying “done!” without even considering whether the same logic allows the nefarious villain to spitefully commit to eating the chocolate anyhow. If the theory says that either of the two commitment outcomes may happen, ok, but I think it deserves mention. And if the theory says is something else, I want to know that too. :-)
You can’t argue with a rock, so you can’t stop a rock-solid commitment, even with your own rock-solid commitment. But you can solve the game given the commitments, with the outcome for each side. If this outcome is inferior to other possible commitments, then those other commitments should be used instead.
So, if the hero expects that his commitment to die will still result in villain making him die, this commitment is not a good idea and shouldn’t be made (for example, maybe the villain just wants to play the game). The tricky part is that if the hero expected his commitment to stop the villain, he still needs to dutifully die once the villain surprised him, to the extent this would be necessary to communicate the commitment to the villain prior to his decision, since it’s precisely this communicated model of behavior that was supposed to stop him.