By adopting and sticking to such a strategy, I will on average come out ahead in a wide variety of Newcomblike situations.
Definitely.
I believe I am now in fact hacked to hardcode the future-irrational decision if it does occur.
Here lies my problem. I would like to adopt such a strategy (or a better one if any exists), and not alter my strategy when I actually encounter a Newcomblike situation. Now in the original Newcomb problem, I have no reason to do so: if I alter my strategy so as to two-box, then I will end up with less money (although I would have difficulties proving this in the formalism I use in the article). But in the mugging problem, altering my strategy to “keep $100 in this instance only” will, in an (Omega appears, coin is tails) state, net me more money. Therefore I believe that keeping to my strategy must have intrinsic value to me, greater than that of the $100 I would lose, in order for me to keep it.
Now I can answer your question about how the MAD brain-hack and the mugging brain-hack are related. In the MAD situation, the institutions actions are “hardcoded” to occur. In the case of the mugging brain-hack, this would count as, say, wiring a device to one’s brain that takes over in Omega situations. This may well be possible in some situations, but I wanted to deal with the harder problem of how to fashion the brain that, on learning it is in a “tails” state, does not then want to remove such a hack.
Now if I expect to be faced with many Omega mugging problems in the future, then a glimmer of hope appears; although “keep $100 in this instance only” may then seem to be an improved strategy, I know that this conclusion must in fact be incorrect, as whatever process I use to arrive at it is, if allowed to operate, highly likely to lose money for me in the future. In other words, this makes the problem more similar to Newcomb’s problem: in the states of the world in which I make the modification, I lose money <-> in the states of the world in which I two-box, I make less money. But the problem as posed involves an Omega turning up and convincing you that this problem is the last Newcomblike problem you will ever face.
ETA: In case it wasn’t clear, if I assign intrinsic value > keeping $100 to keeping my strategy, then I will surely keep my strategy. My question is: in the case of Omega appearing and my becoming convinced that I am facing my last ever Newcomblike problem, will keeping my strategy still have intrinsic value to me?
It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
(This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat.
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.
Definitely.
Here lies my problem. I would like to adopt such a strategy (or a better one if any exists), and not alter my strategy when I actually encounter a Newcomblike situation. Now in the original Newcomb problem, I have no reason to do so: if I alter my strategy so as to two-box, then I will end up with less money (although I would have difficulties proving this in the formalism I use in the article). But in the mugging problem, altering my strategy to “keep $100 in this instance only” will, in an (Omega appears, coin is tails) state, net me more money. Therefore I believe that keeping to my strategy must have intrinsic value to me, greater than that of the $100 I would lose, in order for me to keep it.
Now I can answer your question about how the MAD brain-hack and the mugging brain-hack are related. In the MAD situation, the institutions actions are “hardcoded” to occur. In the case of the mugging brain-hack, this would count as, say, wiring a device to one’s brain that takes over in Omega situations. This may well be possible in some situations, but I wanted to deal with the harder problem of how to fashion the brain that, on learning it is in a “tails” state, does not then want to remove such a hack.
Now if I expect to be faced with many Omega mugging problems in the future, then a glimmer of hope appears; although “keep $100 in this instance only” may then seem to be an improved strategy, I know that this conclusion must in fact be incorrect, as whatever process I use to arrive at it is, if allowed to operate, highly likely to lose money for me in the future. In other words, this makes the problem more similar to Newcomb’s problem: in the states of the world in which I make the modification, I lose money <-> in the states of the world in which I two-box, I make less money. But the problem as posed involves an Omega turning up and convincing you that this problem is the last Newcomblike problem you will ever face.
ETA: In case it wasn’t clear, if I assign intrinsic value > keeping $100 to keeping my strategy, then I will surely keep my strategy. My question is: in the case of Omega appearing and my becoming convinced that I am facing my last ever Newcomblike problem, will keeping my strategy still have intrinsic value to me?
It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.