It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
(This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat.
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.
It all depends on how the hack is administered. If future-me does think rationally, he will indeed come to the conclusion that he should not pay. Any brain-hack that will actually be successful must then be tied to a superseding rational decision or to something other than rationality. If not tied to rationality, it needs to be a hardcoded response, immediately implemented, rather than one that is thought about.
There are obvious ways to set up a superseding condition: put $101 in escrow, hire an assassin to kill you if you renege, but obviously the cost from doing this now is far higher than is justified by the probability of the situation, so we need something completely free. One option is to tie it to something internally valued. eg, you value your given word or self-honesty sufficiently that living with yourself after compromising it is worse than a negative $100 utility. (This only scales to the point where you value integrity however: you may be able to live with yourself better after finding you’re self deluding than after murdering 15 people to prove a point)
Had we access to our own source code, and capacity for self-modification, we could put a hardcoded path when this decision arises. Currently we have to work with the hardware we have, but I believe our brains do have mechanisms for tying future decisions to then-irrational decisions . Making credible threats requires us to back up what we say, even to someone who we will never encounter again afterwards, so similar situations (without the absolute predictive ability) are quite common in life. I know in the past I have acted perversely against my own self-interest to satisfy a past decision / issued threat. In most cases this should be considered irrationality to be removed from myself, but I think I can reuse the same mechanism to achieve an improvement here.
Obviously I can only guess whether this will in fact work in practice. I believe it will for the $100 case, but suspect that with some of the raised stakes examples given (committing murder etc), my future self may wiggle out of the emotional trap I’ve set for him. This is a flaw with my brain-hacking methods however—hardcoding would still be the right thing to do if possible, if the payoff were one that I would willingly trade the cost for.
This is precisely my reasoning too. It doesn’t seem at all sensible to me that the principle of “acting as one would formerly have liked to have precommitted to acting” should have unbounded utility.
ETA: When you say:
Now this seems a very good point to me indeed. If we have evolved machinery present in our brains that predictably and unavoidably makes us feel good about following through on a threat and bad about not doing so—and I think that we do have that machinery—then this comes close to resolving the problem. But the point about such a mechanism is that it is tuned to have a limited effect—an effect that I am pretty sure would be insufficient to cause me to murder 15 people in the vast majority of circumstances.
Mostly agreed, though I’d quibble that it does have unbounded utility, but that I probably don’t have unbounded capability to enact the strategy. If I were capable of (cheaply) compelling my future self to murder in situations where it would be a general advantage to precommit, I would.