That would seem to be a very easy thing for them to test. Unless we keep committing atrocities every now and again to fool them, they’re going to work out that it’s false. Even if they do believe us (or it’s true), that would itself be a good argument why our leaders would want to start the war—leading to the conclusion that they should do so to get the first strike advantage, maximising their chances.
It would seem better to convince them in some way that doesn’t require us to pay such a cost if possible: and to convince the enemy that we’re generally rational, reasonable people except in such circumstances where they attack us.
Many countries involved in protracted disputes do commit atrocities against third parties every now and again; perhaps not for this reason, though.
The problem is that “generally rational, reasonable people” will generally remain so even if attacked. It’s much easier to convince an enemy that you are irrational, to some extent. If you can hide your level of rationality, then in a game like MAD you increase your expected score and reduce your opponent’s by reducing the information available to them.
One difference between MAD and the Omega mugging is that Omega is defined so as to make any such concealment useless.
ETA: This (short and very good) paper by Yamin Htun discusses the kind of irrationality I mean. Quote:
the rational players disguise themselves as irrational; they make others believe they are altruistic, thus forcing others to play cooperatively.
Substitute “anti-altruistic” for “altruistic” and this is what I was aiming at.
But that fooling can only go so far. The better your opponent is at testing your irrational mask, the higher the risk of them spotting a bluff, and thus the closer the gap between acting irrational and being irrational. Only by being irrational can you be sure they won’t spot the lie.
Beyond a certain payoff ratio, the risk from being caught out lying is bigger than the chance of having to carry through. For that reason, you end up actually appointing officers who are will actually carry through—even to the point of blind testing them with simulated tests and removing those who don’t fire in such positions (even if it was the right choice )and letting your opponent know and verify this as much as possible.
If I can take this back to the “agents maximising their utility” interpretation: this is then a genuine example of a brain hack, the brain in this case being the institutional decision structure of a Cold War government (lets say the Soviets). Having decided that only by massively retaliating in the possible world where America has attacked is there a win, and having realised that as currently constituted the institution would not retaliate under those circumstances, the institution modified itself so that it would retaliate under those circumstances. I find it interesting that it would have to use irrational agents (the retaliatory officers) as part of its decision structure in order to achieve this.
This points to another difference between Omega mugging and MAD: whereas in the former, its assumed you have the chance to modify yourself in between Omega appearing and your making the decision, in the MAD case, it is deliberately arranged that retaliation is immediate and automatic (corresponding to removing the ability not to retaliate from the Soviet command structure).
Yes—it is effectively the organisational level of such a brain hack (though it would be advantageous if the officers were performing such a hack on their own brains, rather than being irrational in general—rationality in other situations is a valuable property in those with their fingers on the button.)
In the MAD case, it is deliberately arranged that retaliation is immediate and automatic
Isn’t that exactly the same as the desired effect of your brain-hack in the mugging situation? Instead of removing the ability to not retaliate, we want to remove the ability to not pay. The methods differ (selecting pre-hacked / appropriately damaged brains to make the decisions, versus hacking our own), but the outcome seems directly analogous. Nor is there any further warning: the mugging situation finds you directly in the loss case (as you’d presumably be directly in the win case if the coin flip went differently) potentially before you’d even heard of Omega. Any brain-hacking must occur before the situation comes up unless you’re already someone who would pay.
Isn’t that exactly the same as the desired effect of your brain-hack in the mugging situation? Instead of removing the ability to not retaliate, we want to remove the ability to not pay… the mugging situation finds you directly in the loss case … potentially before you’d even heard of Omega.
OK, so to clarify, the problem you’re considering is the one where, with no preparation on your part, Omega appears and announces tails?
EDIT: Oops. Clearly you don’t mean that. Do you want me to imagine a general hack we can make that increases our expected utility conditional on Omega appearing, but that we can profitably make even without having proof or prior evidence of Omega’s existence?
EDIT 2: I do want to answer your question “Isn’t that exactly the same as the desired effect of your brain-hack in the mugging situation?”, but I’d rather wait on your reply to mine before I formulate it.
Yes, exactly. I think this post by MBlume gives the best description of the most general such hack needed:
If there is an action to which my past self would have precommited, given perfect knowledge, and my current preferences, I will take that action.
By adopting and sticking to such a strategy, I will on average come out ahead in a wide variety of Newcomblike situations. Obviously the actual benefit of such a hack is marginal, given the unlikeliness of an Omega-like being appearing, and me believing it. Since I’ve already invested the effort through considering the optimal route for the thought experiment though, I believe I am now in fact hacked to hardcode the future-irrational decision if it does occur.
By adopting and sticking to such a strategy, I will on average come out ahead in a wide variety of Newcomblike situations.
Definitely.
I believe I am now in fact hacked to hardcode the future-irrational decision if it does occur.
Here lies my problem. I would like to adopt such a strategy (or a better one if any exists), and not alter my strategy when I actually encounter a Newcomblike situation. Now in the original Newcomb problem, I have no reason to do so: if I alter my strategy so as to two-box, then I will end up with less money (although I would have difficulties proving this in the formalism I use in the article). But in the mugging problem, altering my strategy to “keep $100 in this instance only” will, in an (Omega appears, coin is tails) state, net me more money. Therefore I believe that keeping to my strategy must have intrinsic value to me, greater than that of the $100 I would lose, in order for me to keep it.
Now I can answer your question about how the MAD brain-hack and the mugging brain-hack are related. In the MAD situation, the institutions actions are “hardcoded” to occur. In the case of the mugging brain-hack, this would count as, say, wiring a device to one’s brain that takes over in Omega situations. This may well be possible in some situations, but I wanted to deal with the harder problem of how to fashion the brain that, on learning it is in a “tails” state, does not then want to remove such a hack.
Now if I expect to be faced with many Omega mugging problems in the future, then a glimmer of hope appears; although “keep $100 in this instance only” may then seem to be an improved strategy, I know that this conclusion must in fact be incorrect, as whatever process I use to arrive at it is, if allowed to operate, highly likely to lose money for me in the future. In other words, this makes the problem more similar to Newcomb’s problem: in the states of the world in which I make the modification, I lose money <-> in the states of the world in which I two-box, I make less money. But the problem as posed involves an Omega turning up and convincing you that this problem is the last Newcomblike problem you will ever face.
ETA: In case it wasn’t clear, if I assign intrinsic value > keeping $100 to keeping my strategy, then I will surely keep my strategy. My question is: in the case of Omega appearing and my becoming convinced that I am facing my last ever Newcomblike problem, will keeping my strategy still have intrinsic value to me?
It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
(This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat.
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.
That would seem to be a very easy thing for them to test. Unless we keep committing atrocities every now and again to fool them, they’re going to work out that it’s false. Even if they do believe us (or it’s true), that would itself be a good argument why our leaders would want to start the war—leading to the conclusion that they should do so to get the first strike advantage, maximising their chances.
It would seem better to convince them in some way that doesn’t require us to pay such a cost if possible: and to convince the enemy that we’re generally rational, reasonable people except in such circumstances where they attack us.
Many countries involved in protracted disputes do commit atrocities against third parties every now and again; perhaps not for this reason, though.
The problem is that “generally rational, reasonable people” will generally remain so even if attacked. It’s much easier to convince an enemy that you are irrational, to some extent. If you can hide your level of rationality, then in a game like MAD you increase your expected score and reduce your opponent’s by reducing the information available to them.
One difference between MAD and the Omega mugging is that Omega is defined so as to make any such concealment useless.
ETA: This (short and very good) paper by Yamin Htun discusses the kind of irrationality I mean. Quote:
Substitute “anti-altruistic” for “altruistic” and this is what I was aiming at.
But that fooling can only go so far. The better your opponent is at testing your irrational mask, the higher the risk of them spotting a bluff, and thus the closer the gap between acting irrational and being irrational. Only by being irrational can you be sure they won’t spot the lie.
Beyond a certain payoff ratio, the risk from being caught out lying is bigger than the chance of having to carry through. For that reason, you end up actually appointing officers who are will actually carry through—even to the point of blind testing them with simulated tests and removing those who don’t fire in such positions (even if it was the right choice )and letting your opponent know and verify this as much as possible.
If I can take this back to the “agents maximising their utility” interpretation: this is then a genuine example of a brain hack, the brain in this case being the institutional decision structure of a Cold War government (lets say the Soviets). Having decided that only by massively retaliating in the possible world where America has attacked is there a win, and having realised that as currently constituted the institution would not retaliate under those circumstances, the institution modified itself so that it would retaliate under those circumstances. I find it interesting that it would have to use irrational agents (the retaliatory officers) as part of its decision structure in order to achieve this.
This points to another difference between Omega mugging and MAD: whereas in the former, its assumed you have the chance to modify yourself in between Omega appearing and your making the decision, in the MAD case, it is deliberately arranged that retaliation is immediate and automatic (corresponding to removing the ability not to retaliate from the Soviet command structure).
Yes—it is effectively the organisational level of such a brain hack (though it would be advantageous if the officers were performing such a hack on their own brains, rather than being irrational in general—rationality in other situations is a valuable property in those with their fingers on the button.)
Isn’t that exactly the same as the desired effect of your brain-hack in the mugging situation? Instead of removing the ability to not retaliate, we want to remove the ability to not pay. The methods differ (selecting pre-hacked / appropriately damaged brains to make the decisions, versus hacking our own), but the outcome seems directly analogous. Nor is there any further warning: the mugging situation finds you directly in the loss case (as you’d presumably be directly in the win case if the coin flip went differently) potentially before you’d even heard of Omega. Any brain-hacking must occur before the situation comes up unless you’re already someone who would pay.
OK, so to clarify, the problem you’re considering is the one where, with no preparation on your part, Omega appears and announces tails?
EDIT: Oops. Clearly you don’t mean that. Do you want me to imagine a general hack we can make that increases our expected utility conditional on Omega appearing, but that we can profitably make even without having proof or prior evidence of Omega’s existence?
EDIT 2: I do want to answer your question “Isn’t that exactly the same as the desired effect of your brain-hack in the mugging situation?”, but I’d rather wait on your reply to mine before I formulate it.
Yes, exactly. I think this post by MBlume gives the best description of the most general such hack needed:
By adopting and sticking to such a strategy, I will on average come out ahead in a wide variety of Newcomblike situations. Obviously the actual benefit of such a hack is marginal, given the unlikeliness of an Omega-like being appearing, and me believing it. Since I’ve already invested the effort through considering the optimal route for the thought experiment though, I believe I am now in fact hacked to hardcode the future-irrational decision if it does occur.
Definitely.
Here lies my problem. I would like to adopt such a strategy (or a better one if any exists), and not alter my strategy when I actually encounter a Newcomblike situation. Now in the original Newcomb problem, I have no reason to do so: if I alter my strategy so as to two-box, then I will end up with less money (although I would have difficulties proving this in the formalism I use in the article). But in the mugging problem, altering my strategy to “keep $100 in this instance only” will, in an (Omega appears, coin is tails) state, net me more money. Therefore I believe that keeping to my strategy must have intrinsic value to me, greater than that of the $100 I would lose, in order for me to keep it.
Now I can answer your question about how the MAD brain-hack and the mugging brain-hack are related. In the MAD situation, the institutions actions are “hardcoded” to occur. In the case of the mugging brain-hack, this would count as, say, wiring a device to one’s brain that takes over in Omega situations. This may well be possible in some situations, but I wanted to deal with the harder problem of how to fashion the brain that, on learning it is in a “tails” state, does not then want to remove such a hack.
Now if I expect to be faced with many Omega mugging problems in the future, then a glimmer of hope appears; although “keep $100 in this instance only” may then seem to be an improved strategy, I know that this conclusion must in fact be incorrect, as whatever process I use to arrive at it is, if allowed to operate, highly likely to lose money for me in the future. In other words, this makes the problem more similar to Newcomb’s problem: in the states of the world in which I make the modification, I lose money <-> in the states of the world in which I two-box, I make less money. But the problem as posed involves an Omega turning up and convincing you that this problem is the last Newcomblike problem you will ever face.
ETA: In case it wasn’t clear, if I assign intrinsic value > keeping $100 to keeping my strategy, then I will surely keep my strategy. My question is: in the case of Omega appearing and my becoming convinced that I am facing my last ever Newcomblike problem, will keeping my strategy still have intrinsic value to me?
It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.