Counterfactual Mugging
Related to: Can Counterfactuals Be True?, Newcomb’s Problem and Regret of Rationality.
Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it $100 if the coin came up tails.
Omega can predict your decision in case it asked you to give it $100, even if that hasn’t actually happened, it can compute the counterfactual truth. Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would’ve given you $10000.
From your current position, it seems absurd to give up your $100. Nothing good happens if you do that, the coin has already landed tails up, you’ll never see the counterfactual $10000. But look at this situation from your point of view before Omega tossed the coin. There, you have two possible branches ahead of you, of equal probability. On one branch, you are asked to part with $100, and on the other branch, you are conditionally given $10000. If you decide to keep $100, the expected gain from this decision is $0: there is no exchange of money, you don’t give Omega anything on the first branch, and as a result Omega doesn’t give you anything on the second branch. If you decide to give $100 on the first branch, then Omega gives you $10000 on the second branch, so the expected gain from this decision is
-$100 * 0.5 + $10000 * 0.5 = $4950
So, this straightforward calculation tells that you ought to give up your $100. It looks like a good idea before the coin toss, but it starts to look like a bad idea after the coin came up tails. Had you known about the deal in advance, one possible course of action would be to set up a precommitment. You contract a third party, agreeing that you’ll lose $1000 if you don’t give $100 to Omega, in case it asks for that. In this case, you leave yourself no other choice.
But in this game, explicit precommitment is not an option: you didn’t know about Omega’s little game until the coin was already tossed and the outcome of the toss was given to you. The only thing that stands between Omega and your 100$ is your ritual of cognition. And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?
P.S. Let’s assume that the coin is deterministic, that in the overwhelming measure of the MWI worlds it gives the same outcome. You don’t care about a fraction that sees a different result, in all reality the result is that Omega won’t even consider giving you $10000, it only asks for your $100. Also, the deal is unique, you won’t see Omega ever again.
- Embedded Agency (full-text version) by 15 Nov 2018 19:49 UTC; 201 points) (
- Decision Theory FAQ by 28 Feb 2013 14:15 UTC; 119 points) (
- Decision Theories: A Less Wrong Primer by 13 Mar 2012 23:31 UTC; 110 points) (
- Another attempt to explain UDT by 14 Nov 2010 16:52 UTC; 70 points) (
- How can I reduce existential risk from AI? by 13 Nov 2012 21:56 UTC; 63 points) (
- Problematic Problems for TDT by 29 May 2012 15:41 UTC; 62 points) (
- Timeless Decision Theory: Problems I Can’t Solve by 20 Jul 2009 0:02 UTC; 57 points) (
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 48 points) (
- Funding Good Research by 27 May 2012 6:41 UTC; 38 points) (
- Recommended Reading for Friendly AI Research by 9 Oct 2010 13:46 UTC; 35 points) (
- Finite Factored Sets: Applications by 31 Aug 2021 21:19 UTC; 34 points) (
- Disentangling four motivations for acting in accordance with UDT by 5 Nov 2023 21:26 UTC; 33 points) (
- Notes on logical priors from the MIRI workshop by 15 Sep 2013 22:43 UTC; 32 points) (
- Dissolving Confusion around Functional Decision Theory by 5 Jan 2020 6:38 UTC; 32 points) (
- Logical uncertainty reading list by 18 Oct 2014 19:16 UTC; 29 points) (
- Single player extensive-form games as a model of UDT by 25 Feb 2014 10:43 UTC; 26 points) (
- Self-modification is the correct justification for updateless decision theory by 11 Apr 2010 16:39 UTC; 25 points) (
- 31 Mar 2014 15:51 UTC; 25 points) 's comment on Explanations for Less Wrong articles that you didn’t understand by (
- Timelessness as a Conservative Extension of Causal Decision Theory by 28 May 2014 14:57 UTC; 25 points) (
- 16 Jun 2009 20:02 UTC; 23 points) 's comment on Rationalists lose when others choose by (
- Playing Video Games In Shuffle Mode by 23 Mar 2009 11:59 UTC; 20 points) (
- Should logical probabilities be updateless too? by 28 Mar 2012 10:02 UTC; 19 points) (
- In defense of anthropically updating EDT by 5 Mar 2024 6:21 UTC; 18 points) (
- Counterfactual Reprogramming Decision Theory by 10 Sep 2012 1:35 UTC; 18 points) (
- 4 Jun 2010 18:40 UTC; 16 points) 's comment on Virtue Ethics for Consequentialists by (
- 18 Jul 2013 12:57 UTC; 16 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 24, chapter 95 by (
- 30 Jul 2012 19:01 UTC; 14 points) 's comment on AI cooperation is already studied in academia as “program equilibrium” by (
- Fixedness From Frailty by 14 Nov 2010 21:51 UTC; 14 points) (
- UDT agents as deontologists by 10 Jun 2010 5:01 UTC; 14 points) (
- Counterfactual Mugging and Logical Uncertainty by 5 Sep 2009 22:31 UTC; 12 points) (
- My Fundamental Question About Omega by 10 Feb 2010 17:26 UTC; 11 points) (
- 7 Jun 2009 11:15 UTC; 10 points) 's comment on indexical uncertainty and the Axiom of Independence by (
- 18 Aug 2010 19:44 UTC; 8 points) 's comment on How can we compare decision theories? by (
- 7 Jul 2012 8:01 UTC; 8 points) 's comment on Interlude for Behavioral Economics by (
- 28 Jun 2010 23:33 UTC; 7 points) 's comment on Open Thread June 2010, Part 4 by (
- 10 Dec 2010 17:07 UTC; 7 points) 's comment on Unpacking the Concept of “Blackmail” by (
- Counterfactual Mugging: Why should you pay? by 17 Dec 2019 22:16 UTC; 7 points) (
- Sleeping Beauty gets counterfactually mugged by 26 Mar 2009 11:44 UTC; 6 points) (
- Less Wrong: Progress Report by 24 Apr 2009 23:49 UTC; 5 points) (
- UDT might not pay a Counterfactual Mugger by 21 Nov 2020 23:27 UTC; 5 points) (
- 1 Jun 2010 18:23 UTC; 5 points) 's comment on Open Thread: June 2010 by (
- Precommitting to paying Omega. by 20 Mar 2009 4:33 UTC; 5 points) (
- 25 Jul 2010 21:57 UTC; 4 points) 's comment on Harry Potter and the Methods of Rationality discussion thread by (
- 27 Aug 2012 9:34 UTC; 4 points) 's comment on Stupid Questions Open Thread Round 4 by (
- Counterfactual Mugging v. Subjective Probability by 20 Jul 2009 16:31 UTC; 4 points) (
- 10 Mar 2010 21:30 UTC; 3 points) 's comment on The Blackmail Equation by (
- 17 Aug 2014 19:20 UTC; 3 points) 's comment on Truth vs Utility by (
- 29 Jul 2010 6:06 UTC; 3 points) 's comment on Open Thread: July 2010, Part 2 by (
- 16 Sep 2010 19:01 UTC; 3 points) 's comment on LW’s first job ad by (
- 29 Aug 2009 3:17 UTC; 3 points) 's comment on Don’t be Pathologically Mugged! by (
- 6 Sep 2009 0:51 UTC; 3 points) 's comment on Counterfactual Mugging and Logical Uncertainty by (
- 4 Apr 2009 18:54 UTC; 2 points) 's comment on Rationality is Systematized Winning by (
- 13 Aug 2009 13:00 UTC; 2 points) 's comment on Towards a New Decision Theory by (
- 18 Nov 2010 17:26 UTC; 2 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 5 by (
- Counterfactual self-defense by 23 Nov 2012 10:15 UTC; 2 points) (
- 16 Sep 2013 0:09 UTC; 2 points) 's comment on Notes on logical priors from the MIRI workshop by (
- 4 Feb 2011 18:58 UTC; 1 point) 's comment on You’re in Newcomb’s Box by (
- 22 Jul 2009 20:28 UTC; 1 point) 's comment on Timeless Decision Theory: Problems I Can’t Solve by (
- 27 Jul 2014 0:12 UTC; 1 point) 's comment on Value ethics vs. agency ethics by (
- 10 Jun 2010 13:10 UTC; 1 point) 's comment on UDT agents as deontologists by (
- 23 May 2011 3:24 UTC; 0 points) 's comment on The Aliens have Landed! by (
- 22 May 2011 7:06 UTC; 0 points) 's comment on The Aliens have Landed! by (
- 1 Feb 2011 17:05 UTC; 0 points) 's comment on Counterfactual Calculation and Observational Knowledge by (
- 9 Jul 2013 16:56 UTC; 0 points) 's comment on My Take on a Decision Theory by (
- 19 Aug 2010 5:35 UTC; 0 points) 's comment on Desirable Dispositions and Rational Actions by (
- 20 Jul 2009 17:37 UTC; 0 points) 's comment on Counterfactual Mugging v. Subjective Probability by (
- 1 Jul 2015 13:41 UTC; -1 points) 's comment on Two-boxing, smoking and chewing gum in Medical Newcomb problems by (
- 12 Apr 2010 0:48 UTC; -2 points) 's comment on Self-modification is the correct justification for updateless decision theory by (
- 19 May 2009 17:58 UTC; -2 points) 's comment on Bad reasons for a rationalist to lose by (
- Newcomblike problem: Counterfactual Informant by 12 Apr 2012 20:25 UTC; -3 points) (
Imagine that one day you come home to see your neighbors milling about your house and the Publisher’s Clearinghouse (PHC) van just pulling away. You know that PHC has been running a new schtick recently of selling $100 lottery tickets to win $10,000 instead of just giving money away. In fact, you’ve used that very contest as a teachable moment with your kids to explain how once the first ticket of the 100 printed was sold, scratched, and determined not to be the winner—that the average expected value of the remaining tickets was greater than their cost and they were therefore increasingly worth buying. Now, it’s weeks later, most of the tickets have been sold, scratched, and not winners and they came to your house. In fact, there were only two tickets remaining. And you weren’t home. Fortunately, your neighbor and best friend Bob asked if he could buy the ticket for you. Sensing a great human interest story (and lots of publicity), PHC said yes. Unfortunately, Bob picked the wrong ticket. After all your neighbors disperse and Bob and you are alone, Bob says that he’d really appreciate it if he could get his hundred dollars back. Is he mugging you? Or, do you give it to him?
Yes, I think you still owe him the $100.
But I like how you made it into a relatively realistic scenario.
Considering the ticket was worth $5,000 when he bought it, sure.
Did you give the same answer to Omega? The cases are exactly analogous. (Or do you argue that they are not?)
The disanalogy here is that you have a long term social relationship with Bob that you don’t have with Omega, and the $100 are an investment into that relationship.
Also, there is the possibility of future scenarios arising in which Bob could choose to take comparable actions, and we want to encourage him in doing so. I agree that the cases are not exactly analogous.
The outcomes don’t seem to be tied together as they were in the original problem; is it true that if had he won, he would only then have given you the money if, had he not won, you would have given him the $100 back? That isn’t clear.
The counterfactual anti-mugging: One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.
If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn’t ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?
Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.
Before you open the box, you have a chance to precommit. What do you do?
I would have no actionable suspicion that I should give Omega the $100 unless I knew about No-mega. So I get the $10000 only if No-mega asks the question “What would Eliezer do knowing about No-mega?” and not if No-mega asks the question “What would Eliezer do not knowing about No-mega?”
You forgot about MetaOmega, who gives you $10,000 if and only if No-mega wouldn’t have given you anything, and O-mega, who kills your family unless you’re an Alphabetic Decision Theorist. This comment doesn’t seem specifically anti-UDT—after all, Omega and No-mega are approximately equally likely to exist; a ratio of 1:1 if not an actual p of .5 -- but it still has the ring of Just Cheating. Admittedly, I don’t have any formal way of telling the difference between decision problems that feel more or less legitimate, but I think part of the answer might be that the Counterfactual Mugging isn’t really about how to act around superintelligences: It illustrates a more general need to condition our decisions based on counterfactuals, and as EY pointed out, UDT still wins the No-mega problem if you know about No-mega, so whether or not we should subscribe to some decision theory isn’t all that dependent on which superintelligences we encounter.
I’m necroing pretty hard and might be assuming too much about what Caspian originally meant, so the above is more me working this out for myself than anything else. But if anyone can explain why the No-mega problem feels like cheating to me, that would be appreciated.
Do you have a point?
Yes, that there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other. Which precommitment is most rational depends depends on the which type you expect to encounter.
I don’t expect to encounter either, and on the other hand I can’t rule out fallible human analogues of either. So for now I’m not precommitting either way.
You don’t precommit to “give away the $100, to anyone who asks”. You precommit to give away the $100 in exactly the situation I described. Or, generalizing such precommitments, you just compute your decisions on the spot, in a reflectively consistent fashion. If that’s what you want do to with your future self, that is.
Yeah, now. But after Omega really, really, appears in front of you, chance of Omega existing is about 1. Chance of No-Mega is still almost non-existent. In this problem, existence of Omega is given. It’s not something you are expecting to encounter now, just as we’re not expecting to encounter eccentric Kavkan billionaires that will give you money for toxicating yourself. The Kavka’s Toxin and the counterfactual mugging present a scenario that is given, and ask you how would you act then.
But you aren’t supposed to be updating… the essence of UDT, I believe, is that your policy should be set NOW, and NEVER UPDATED.
So… either:
You consider the choice of policy based on the prior where you DIDN’T KNOW whether you’d face Nomega or Omega, and NEVER UPDATE IT (this seems obviously wrong to me: why are you using your old prior instead of your current posterior?). or
You consider the choice of policy based on the prior where you KNOW that you are facing Omega AND that the coin is tails, in which case paying Omega only loses you money.
It doesn’t prevent doing different actions in different circumstances, though. That’s not what “updateless” means. It means that you should act as your past self would have precommitted to doing in your situation. Your probability estimate for “I see Omega” should be significantly greater than “I see Omega, and also Nomega is watching and deciding how to act”, so your decision should be mostly determined by Omega, not Nomega. (The Metanomega also applies—there’s a roughly equal chance of Metanomega or Nomega waiting and watching. [Metanomega = Nomega reversed; gives payoff iff predicts you paying.])
I see where I went wrong. I assumed that the impact of one’s response to Omega is limited to the number of worlds in which Omega exists. That is, my reasoning is invalid if (“what I do in scenario X” is meaningful and affects the world even if scenario X never happens). In other words, when one is being counterfactually modeled, which is exactly the topic of discussion.
Thanks for pointing that out. The answer is, as expected, a function of p. So I now find explanations of why UDT gets mugged incomplete and misleading.
Here’s my analysis:
The action set is {give, don’t give}, which I’ll identify with {1, 0}. Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.
We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose. So we’re left with
pi_give
and
pi_don’t,
which give/don’t, respectively, to Omega.
Now, we can easily compute expected value, as follows:
r (pi_give(N)) = 0
r (pi_give(O, heads)) = 10
r (pi_give(0, tails)) = −1
r (pi_don’t(N)) = 10
r (pi_don’t(0)) = 0
So now:
Eg := E_give(r) = 0 p + .5 (10-1) * (1-p)
Ed := E_don’t(r) = 10 p + 0 (1-p)
Eg > Ed whenever 4.5 (1-p) > 10 p,
i.e. whenever 4.5 > 14.5 p
i.e. whenever 9⁄29 > p
So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.
Philosopher Kenny Easwaran reported in 2007 that:
Korff also reinvents counterfactual mugging:
And he looks into generalizing to the algorithmic version:
Korff is now an Asst. Prof. at Georgie State.
If it’s an iterated game, then the decision to pay is a lot less unintuitive.
My two bits: Omega’s request is unreasonable.
Precommitting is something that you can only do before the coin is flipped. That’s what the “pre” means. Omega’s game rewards a precommitment, but Omega is asking for a commitment.
Precommitting is a rational thing to do because before the coin toss, the result is unknown and unknowable, even by Omega (I assume that’s what “fair coin” means). This is a completely different course of action than committing after the coin toss is known! The utility computation for precommitment is not and should not be the same as the one for commitment.
In the example, you have access to information that pre-you doesn’t (the outcome of the flip). If rationalists are supposed to update on new information, then it is irrational for you to behave like pre-you.
Precommittment does make one boxing on Newcomblike problems a whole lot easier. But it isn’t necessarily required. That’s why Vladimir made an effort to exculde precommitment.
I don’t agree. I suggest that pre-you has exactly the same information that you have. The pre-you must be considered to have been given exactly the same inputs as you to the extent that they influence the decision. That is implied by the ability of the Omega to make the accurate prediction that we have been assured he made.
By definition, pre-you only has access to the coin’s probability distribution, while you have access to the result of the coin flip. Surely you don’t mean to say that’s the same thing?
From the perspective of a non-superintelligence, Omega’s prediction abilities are indistinguishable from magic. Human beings can’t tell what they “imply.” Trying to figure out the implications with a primate brain will only get you into a paradox like claiming a fact is the same as a probability distribution. All we can reasonably do is stipulate Omega’s abilities needed to make the problem work and no further.
We’re assuming Omega is trustworthy? I’d give it the $100, of course.
Had the coin come up differently, Omega might have explained the secrets of friendly artificial general intelligence. However, he now asks that you murder 15 people.
Omega remains completely trustworthy, if a bit sick.
Ha, I’ll re-raise: Had the coin come up differently, Omega would have filled ten Hubble volumes with CEV-output. However, he now asks that you blow up this Hubble volume.
(Not only do you blow up the universe (ending humanity for eternity) you’re glad that Omega showed to offer this transparently excellent deal. Morbid, ne?)
Ouch.
For some reason, raising the stakes in these hypotheticals to the point of actual pain has become reflex for me. I’m not sure if it’s to help train my emotions to be able to make the right choices in horrible circumstances, or just my years in the Bardic Conspiracy looking for an outlet.
Raising the stakes in this way does not work, because of the issue described in Ethical Injunctions: it is less likely that Omega has presented you with this choice, than that you have gone insane.
So imagine yourself in the most inconvenient possible world where Omega is a known feature of the environment and has long been seen to follow through on promises of this type; it does not particularly occur to you or anyone that believing this fact makes you insane.
When I phrase it that way—imagine myself in a world full of other people confronted by similar Omega-induced dilemmas—I suddenly find that I feel substantially less uncomfortable; indicating that some of what I thought was pure ethical constraint is actually social ethical constraint. Still, it may function to the same self-protective effect as ethical constraint.
To add to the comments below, if you’re going to take this route, you might as well have already decided that encountering Omega at all is less likely than that you have gone insane.
That may be true, but it’s still a dodge. Conditional on not being insane, what’s your answer?
Additionally, I don’t see why Omega asking you to give it 100 dollars vs 15 human lives necessarily crosses the threshold of “more likely that I’m just a nutbar”. I don’t expect to talk to Omega anytime soon...
We’re assuming Omega is trustworthy? I’d murder 15 people, of course.
I’ll note that the assumption that I trust the Omega up to stakes this high is a big one. I imagine that the alterations being done to my brain in the counterfactualisation process would have rather widespread implications on many of my thought processes and beliefs once I had time to process it.
Completely agreed, a major problem in any realistic application of such scenarios.
I’m afraid I don’t follow.
Can you please explain the reasoning behind this? Given all of the restrictions mentioned (no iterations, no possible benefit to this self) I can’t see any reason to part with my hard earned cash. My “gut” says “Hell no!” but I’m curious to see if I’m missing something.
There are various intuition pumps to explain the answer.
The simplest is to imagine that a moment from now, Omega walks up to you and says “I’m sorry, I would have given you $10000, except I simulated what would happen if I asked you for $100 and you refused”. In that case, you would certainly wish you had been the sort of person to give up the $100.
Which means that right now, with both scenarios equally probable, you should want to be the sort of person who will give up the $100, since if you are that sort of person, there’s half a chance you’ll get $10000.
If you want to be the sort of person who’ll do X given Y, then when Y turns up, you’d better bloody well do X.
Well said. That’s a lot of the motivation behind my choice of decision theory in a nutshell.
Thanks, it’s good to know I’m on the right track =)
I think this core insight is one of the clearest changes in my thought process since starting to read OB/LW—I can’t imagine myself leaping to “well, I’d hand him $100, of course” a couple years ago.
I think this describes one of the core principles of virtue theory under any ethical system.
I wonder how much it depends upon accidents of human psychology, like our tendency to form habits, and how much of it is definitional (if you don’t X when Y, then you’re simply not the sort of person who Xes when Y)
That’s not the situation in question. The scenario laid out by Vladimir_Nesov does not allow for an equal probability of getting $10000 and paying $100. Omega has already flipped the coin, and it’s already been decided that I’m on the “losing” side. Join that with the fact that me giving $100 now does not increase the chance of me getting $10000 in the future because there is no repetition.
Perhaps there’s something fundamental I’m missing here, but the linearity of events seems pretty clear. If Omega really did calculate that I would give him the $100 then either he miscalculated, or this situation cannot actually occur.
-- EDIT --
There is a third possibility after reading Cameron’s reply… If Omega is correct and honest, then I am indeed going to give up the money.
But it’s a bit of a trick question, isn’t it? I’m going to give up the money because Omega says I’m going to give up the money and everything Omega says is gospel truth. However, if Omega hadn’t said that I would give up the money, then I wouldn’t of given up the money. Which makes this a bit of an impossible situation.
Assuming the existence of Omega, his intelligence, and his honesty, this scenario is an impossibility.
I feel like a man in an Escher painting, with all these recursive hypothetical mes, hypothetical kuriges, and hypothetical omegas.
I’m saying, go ahead and start by imagining a situation like the one in the problem, except it’s all happening in the future—you don’t yet know how the coin will land.
You would want to decide in advance that if the coin came up against you, you would cough up $100.
The ability to precommit in this way gives you an advantage. It gives you half a chance at $10000 you would not otherwise have had.
So it’s a shame that in the problem as stated, you don’t get to precommit.
But the fact that you don’t get advance knowledge shouldn’t change anything. You can just decide for yourself, right now, to follow this simple rule:
If there is an action to which my past self would have precommited, given perfect knowledge, and my current preferences, I will take that action.
By adopting this rule, in any problem in which the oppurtunity for precommiting would have given you an advantage, you wind up gaining that advantage anyway.
That one sums it all up nicely!
I’m actually not quite satisfied with it. Probability is in the mind, which makes it difficult to know what I mean by “perfect knowledge”. Perfect knowledge would mean I also knew in advance that the coin would come up tails.
I know giving up the $100 is right, I’m just having a hard time figuring out what worlds the agent is summing over, and by what rules.
ETA: I think “if there was a true fact which my past self could have learned, which would have caused him to precommit etc.” should do the trick. Gonna have to sleep on that.
ETA2: “What would you do in situation X?” and “What would you like to pre-commit to doing, should you ever encounter situation X?” should, to a rational agent, be one and the same question.
...and that’s an even better way of putting it.
Note that this doesn’t apply here. It’s “What would you do if you were counterfactually mugged?” versus “What would you like to pre-commit to doing, should you ever be told about the coin flip before you knew the result?”. X isn’t the same.
MBlume:
This phrasing sounds about right. Whatever decision-making algorithm you have drawing your decision D when it’s in situation X, should also come to the same conditional decision before the situation X appeared, “if(X) then D”. If you actually don’t give away $100 in situation X, you should also plan to not give away $100 in case of X, before (or irrespective of whether) X happens. Whichever decision is the right one, there should be no inconsistency of this form. This grows harder if you must preserve the whole preference order.
“Perfect knowledge would mean I also knew in advance that the coin would come up tails.”
This seems crucial to me.
Given what I know when asked to hand over the $100, I would want to have pre-committed to not pre-committing to hand over the $100 if offered the original bet.
Given what I would know if I were offered the bet before discovering the outcome of the flip I would wish to pre-commit to handing it over.
From which information set I should evaluate this? The information set I am actually at seems the most natural choice, and it also seems to be the one that WINS (at least in this world).
What am I missing?
I’ll give you the quick and dirty patch for dealing with omega: There is no way to know that, at that moment, you are not inside of his simulation. by giving him the 100$, there is a chance you are tranfering that money from within a simulation-which is about to be terminated-to outside of the simulation, with a nice big multiplier.
Not if precommiting potentially has other negative consequences. As Caspian suggested elsewhere in the thread, you should also consider the possibility that the universe contains No-megas who punish people who would cooperate with Omega.
...why should you also consider that possibility?
Because if that possibility exists, you should not necessarily precommit to cooperate with Omega, since that risks being punished by No-mega. In a universe of No-megas, precommiting to cooperate with Omega loses. This seems to me to create a distinction between the questions “what would you do upon encountering Omega?” and “what will you now precommit to doing upon encountering Omega?”
I suppose my real objection is that some people seem to have concluded in this thread that the correct thing to do is to, in advance, make some blanket precommitment to do the equivalent of cooperating with Omega should they ever find themselves in any similar problem. But I feel like these people have implicitly made some assumptions about what kind of Omega-like entities they are likely to encounter: for instance that they are much more likely to encounter Omega than No-mega.
But No-mega also punishes people who didn’t precommit but would have chosen to cooperate after meeting Omega. If you think No-mega is more likely than Omega, then you shouldn’t be that kind of person either. So it still doesn’t distinguish between the two questions.
|Perfect knowledge
use a Quantum coin-it conveniently comes up both.
I don’t see this situation is impossible, but I think it’s because I’ve interpreted it differently from you.
First of all, I’ll assume that everyone agrees that given a 50⁄50 bet to win $10′000 versus losing $100, everyone would take the bet. That’s a straightforward application of utilitarianism + probability theory = expected utility, right?
So Omega correctly predicts that you would have taken the bet if he had offered it to you (a real no brainer; I too can predict that you would have taken the bet had he offered it).
But he didn’t offer it to you. He comes up now, telling you that he predicted that you would accept the bet, and then carried out the bet without asking you (since he already knew you would accept the bet), and it turns out you lost. Now he’s asking you to give him $100. He’s not predicting that you will give him that number, nor is he demanding or commanding you to give it. He’s merely asking. So the question is, do you do it?
I don’t think there’s any inconsistency in this scenario regardless of whether you decide to give him the money or not, since Omega hasn’t told you what his prediction would be (though if we accept that Omega is infallible, then his prediction is obviously exactly whatever you would actually do in that situation).
Omega hasn’t told you his predictions in the given scenario.
That’s absolutely true. In exactly the same way, if the Omega really did calculate that I wouldn’t give him the $100 then either he miscalculated, or this situation cannot actually occur.
The difference between your counterfactual instance and my counterfactual instance is that yours just has a weird guy hassling you with deal you want to reject while my counterfactual is logically inconsistent for all values of ‘me’ that I identify as ‘me’.
Thank you. Now I grok.
So, if this scenario is logically inconsistent for all values of ‘me’ then there really is nothing that I can learn about ‘me’ from this problem. I wish I hadn’t thought about it so hard.
Logically inconsistent for all values of ″ that would hand over the $100. For all values of ″ that would keep the $100 it is logically consistent but rather obfuscated. It is difficult to answer a multiple choice question when considering the correct answer throws null.
I liked this position—insightful, so I’m definitely upvoting.
But I’m not altogether convinced it’s a completely compelling argument. With the amounts reversed, Omega could have walked up to you and said “I would have given you $100 except if I asked you for $10.000 you would have refused.” You’d then certainly wish to have been the sort of person to counterfactually have given up the $10000, because in the real world it’d mean you’d get $100, even though you’d certainly REJECT that bet if you had a choice for it in advance.
Not necessarily; it depends on relative frequency. If Omega has a 10^-9 chance of asking me for $10000 and otherwise will simulate my response to judge whether to give me $100, and if I know that (perhaps Omega earlier warned me of this), I would want to be the type of person who gives the money.
Is that an acceptable correction?
Well, with a being like Omega running around, the two become more or less identical.
If we’re going to invent someone who can read thoughts perfectly, we may as well invent someone who can conceal thoughts perfectly.
Anyway, there aren’t any beings like Omega running around to my knowledge. If you think that concealing motivations is harder than I think, and that the only way to make another human think you’re a certain way is to be that way, say that.
And if Omega comes up to me and says “I was going to kill you if you gave me $100. But since I’ve worked out that you won’t, I’ll leave you alone.” then I’ll be damn glad I wouldn’t agree.
This really does seem like pointless speculation.
Of course, I live in a world where there is no being like Omega that I know of. If I knew otherwise, and knew something of their properties, I might govern myself differently.
We’re not talking Pascal’s Wager here, you’re not guessing at the behaviour of capricious omnipotent beings. Omega has told you his properties, and is assumed to be trustworthy.
You are stating that. But as far as I can tell Omega is telling me its a capricious omnipotent being. If there is a distinction, I’m not seeing it. Let me break it down for you:
1) Capricious → I am completely unable to predict its actions. Yes.
2) Omnipotent → Can do the seemingly impossible. Yes.
So, what’s the difference?
It’s not capricious in the sense you give: you are capable of predicting some of its actions: because it’s assumed Omega is perfectly trustworthy, you can predict with certainty what it will do if it tells you what it will do.
So, if it says it’ll give you 10k$ in some condition (say, if you one-box its challenge), you can predict that it’ll give it the money if that condition arises.
If it were capricious in the sense of complete inability of being predicted, it might amputate three of your toes and give you a flower garland.
Note that the problem supposes you do have certainty that Omega is trustworthy; I see no way of reaching that epistemological state, but then again I see no way Omega could be omnipotent, either.
On an somewhat unrelated note, why would Omega ask you for 100$ if it had simulated you wouldn’t give it the money? Also, why would it do the same if it had simulated you would give it the money? What possible use would an omnipotent agent have for 100$?
Omega is assumed to be mildly bored and mildly anthropic. And his asking you for 100$ could always be PART of the simulation.
Yes, it’s quite reasonable that if it was curious about you it would simulate you and ask the simulation a question. But once it did that, since the simulation was perfect, why would it waste the time to ask the real you? After all, in the time it takes you to understand Omega’s question it could probably simulate you many times over.
So I’m starting to think that encountering Omega is actually pretty strong evidence for the fact that you’re simulated.
Maybe Omega recognizes in advance that you might think this way, doesn’t want it to happen, and so precommits to asking the real you. With the existence of this precommitment, you may not properly make this reasoning. Moreover, you should be able to figure out that Omega would precommit, thus making it unnecessary for him to explicitlyy tell you he’s doing so.
(Emphasis mine.)
I don’t think, given the usual problem formulation, that one can figure out what Omega wants without Omega explicitly saying it, and maybe not even in that case.
It’s a bit like a deal with a not-necessarily-evil devil. Even if it tells you something and you’re sure it’s not lying and you think you the wording is perfectly clear, you should still assign a very high probability that you have no idea what’s really going on and why.
If we assume I’m rational, then I’m not going to assume anything about Omega. I’ll base my decisions on the given evidence. So far, that appears to be described as being no more and no less than what Omega cares to tell us.
Fine, then interchange “assume omega is honest” with, say, “i’ve played a billiion rounds of one-box two-box with him” …It should be close enough.
I realize this is fighting the problem, but: If I remember playing a billion rounds of the game with Omega, that is pretty strong evidence that I’m a (slightly altered) simulation. An average human takes about a ten million breaths each year...
OK, so assume that I’m a transhuman and can actually do something a billion times. But if Omega can simulate me perfectly, why would it actually waste the time to ask you a question, once it simulated you answering it? Let alone do that a billion times… This also seems like evidence that I’m actually simulated. (I notice that in most statements of the problem, the wording is such that it is implied but not clearly stated that the non-simulated version of you is ever involved.)
I work on AI. In particular, on decision systems stable under self-modification. Any agent who does not give the $100 in situations like this will self-modify to give $100 in situations like this. I don’t spend a whole lot of time thinking about decision theories that are unstable under reflection. QED.
Even considering situations like this and having special cases for them sounds like it would add a bit much cruft to the system.
Do you have a working AI that I could look at to see how this would work?
If you need special cases, your decision theory is not consistent under reflection. In other words, it should simply always do the thing that it would precommit to doing, because, as MBlume put it, the decision theory is formulated in such fashion that “What would you precommit to?” and “What will you do?” work out to be one and the same question.
But this is precisely what humans don’t do, because we respond to a “near” situation differently than a “far” one. Your advance prediction of your decision is untrustworthy unless you can successfully simulate the real future environment in your mind with sufficient sensory detail to invoke “near” reasoning. Otherwise, you will fail to reach a consistent decision in the actual situation.
Unless of course, In the actual situation, you’re projecting back, “What would I have decided in advance to do had I thought about this in advance?”—and you successfully mitigate all priming effects and situationally-motivated reasoning.
Or to put all of the above in short, common-wisdom form: “that’s easy for you to say NOW...” ;-)
Here is one intuitive way of looking at it:
Before tossing the coin, the Omega perfectly emulates my decision making process. In this emulation he tells me that I lost the coin toss, explains the deal and asks me to give him $100. If this emulated me gives up the $100 then he has a good chance of getting $10,000.
I have absolutely no way of knowing whether I am the ‘emulated me’ or the real me. Vladmir’s specification is quite unambiguous. I, me, the one doing the deciding right now in this real world, am the same me as the one inside the Omega’s head. If the emulation is in any way different to me then the Omega isn’t the Omega. The guy in the Omega’s head has been offered a deal that any rational man would accept, and I am that man.
So, it may sound stupid that I’m giving up $100 with no hope of getting anything back. But that’s because the counterfactual is stupid, not me.
(Disclaimer: I’m going to use the exact language you used, which means I will call you “stupid” in this post. I apologize if this comes off as trollish. I will admit that I am also quite torn about this decision, and I feel quite stupid too.)
No offense, but assuming free will, you are the one who is deciding to actually hand over the $100. The conterfactual isn’t the one making the decision. You are. You are in a situation, and there are two possible actions (lose $100 or don’t lose $100), and you are choosing to lose $100.
So again, are you sure you are not stupid?
And now I try to calculate what you should treat as being the probability that you’re being emulated. Assume that Omega only emulates you if the coin comes up heads.
Suppose you decide beforehand that you are going to give Omega the $100, as you ought to. The expected value of this is $4950, as has been calculated.
Suppose that instead, you decide beforehand that E is the probability you’re being emulated assuming you hear that came up tails. You’ll still decide to give Omega the $100; therefore, your expected value if you hear that it came up heads is $10,000. Your expected value if you hear that the coin came up tails is -$100(1-E) + $10,000E.
The probability that you hear that the coin comes up tails should be given by P(H) + P(T and ~E) + P(T and E) = 0, P(H) = P(T and ~E), P(T and ~E) = P(T) - P(T and E), P(T and E) = P(E|T) * P(T). Solving these equations, I get P(E|T) = 2, which probably means I’ve made a mistake somewhere. If not, c’est l’Omega?
um… lets see....
to REALLY evaluate that, we technically need to know how long omega runs the simulation for.
now, we have two options: one, assume omega keeps running the simulation indefinitely. two, assume that omega shuts the simulation down once he has the info he’s looking for (and before he has to worry about debugging the simulation.)
in # 1, what we are left with is p(S)=1/3, p(H)=1/3, p(t)=1/3, which means we’re moving 200$/3 from part of our possibility cloud to gain 10,000$/3 in another part.
In #2, we’re moving a total of 100⁄2 $ to gain 10000⁄2. The 100$ in the simulation is quantum-virtual.
so, unless you have reason to suspect omega is running a LOT of simulations of you, AND not terminating them after a minute or so...(aka, is not inadvertently simulation-mugging you)...
You can generally treat Omega’s simulation capacity as a dashed causality arrow from one universe to another-sortof like the shadow produced by the simulation...
So from my and Omega’s perspective this coin is random and my behavior is predictable. Amusing. My question: What if Omega says “due to quirks in your neurology, had I requested it, you would have pre-committed to bet $100 against $46.32. As it happens, you lost anyway, but you would have taken an unfavorable deal. Would you pay then?
Nope. I don’t care what quirks in my neurology do—I don’t care what answer the material calculator returns, only the answer to 2 + 2 = ?
Meh, the original is badly worded.
Take 2. Omega notices a neuro-quirk. Then, based on what he’s noticed, he offers you a 50⁄50 bet of 100$ to 43.25 dollars at just the right time with just the right intonation...
NOW do you take that bet?
...Why yes, yes you do. Even you. And you know it. it’s related to why you don’t think boxing an AI is the answer. only, Omega’s already out of the box, and so can adjust your visual and auditory input with a much higher degree of precision.
No it isn’t. Your ‘Take 2’ is an entirely different question. One that seems to miss the point. The question “Can Omega exploit a vulnerability of human psychology?” isn’t a particularly interesting one and becomes even less so when by the definition of Omega and the problem specification the answer is either “Yes” or “I deny the counterfactual” regardless of anything to do with vulnerabilities in human intellectual capabilities.
oh. whoops.… so more like a way of poking holes in the strategy “i will do whatever I would have precommitted to do”?
A way of trying to, yes.
The coin toss may be known to Omega and predicted in advance, it only needs to initially have 50⁄50 odds to you for the expected gain calculation to hold. When Omega tells you about the coin, it communicates to you its knowledge about the toss, about an independent variable of initial 50⁄50 odds. For example, Omega may tell you that it hasn’t tossed the coin yet, it’ll do so only a thousand years from now, but it predicted that the coin will come up tails, so it asks you for your $100.
This requires though that Omega have decided to make the bet in a fashion which exhibited no dependency on its advance knowledge of the coin.
This is a big issue which I unsucessfully tried to address in my non-existing 6+ paragraph explanation. Why the heck is Omega making bets if he can already predict everything anyway?
That said, it’s not clear that when Omega offers you a bet, you should automatically refuse it under the assumption that Omega is trying to “beat” you. It seems like Omega doesn’t really mind giving away money (pretty reasonable for an omniscient entity), since he seems to be willing to leave boxes with millions of dollars in them just lying around.
What is Omega’s purpose is entirely unknown. Maybe he wants you to win these bets. If you’re a rational person who “wants to win”, I think you can just “not worry” about what Omega’s intents are, and figure out what sequence of actions maximizes your utility (which in these examples always seems to directly translate into maximizing the amount of money you get).
Quantum Coins. seriously. they’re easy enough to predict if you accept many worlds.
as for the rest… entertainment? Could be a case of “even though I can predict these humans so well, it’s fascinating as to just how many of them two-box no matter how obvious i make it.”
It’s not impossible-we know that we exist, it is not impossible that some race resembling our own figured out a sufficient solution to the lob problem and became a race of omegas...
That’s just like playing “Eeny, meeny, miny, moe” to determine who’s ‘it’. Once you figure out if there’s an even or odd number of words, you know the answer, and it isn’t random to you anymore. This may be great as a kid choosing who gets a cookie (wow! I win again!), but you’re no longer talking about something that can go either way.
For a random output of a known function, you still need a random input.
The trick with eeny-meeny-miney-moe is that it’s long enough for us to not consciously and quickly identify whether the saying is odd or even, gives a 0, 1, or 2 on modulo 3, etc, unless we TRY to remember what it produces, or TRY to remember if it’s odd or even before pointing it out. Knowing that doing so consciously ruins its capacity, we can turn to memory decay to restore some of the pseudo-random quality. basically, by sufficiently decoupling “point at A” from “choose A” to our internal cognitive algorithms...we change the way we route visual input and spit out a “point at X”.
THAT”S where the randomness of eeny-meeny-miney-moe comes in...though I’ve probably got only one use left of it when it comes to situations with 2 items thanks to writing this up...
There exist QUANTUM coins, you know. when they see a fork in the road, they take it.
I’d be feeling a little queasy if omega came up to me and said that. maybe I’d say “erm, thanks for not taking advantage of me, then...I guess?”
Hi,
My name is Omega. You may have heard of me.
Anyway, I have just tossed a fair coin, and given that the coin came up tails, I’m gonna have to ask each of you to give me $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But see, if the coin came up heads instead of tails, I’d have given you each $10000, but only to those that would agree to give me $100 if the coin came up tails.
You forgot to add that we have sufficient reason to believe everything you say.
I don’t believe you.
You know, if Omega is truly doing a full simulation of my cognitive algorithm, then it seems my interactions with him should be dominated by my desire for him to stop it, since he is effectively creating and murdering copies of me.
The decision doesn’t need to be read off from a straightforward simulation, it can be an on-demand, so to say, reconstruction of the outcome from the counterfactual. I believe it should be possible to calculate just your decision, without constructing a morally significant computation. Knowing your decision may be as simple as checking whether you adhere a certain decision theory.
There is no rule that says I need to care what the Omega does in his own head. If you object to being temporarily emulated then I can certainly see why you would be adverse to that. But I don’t happen to object and nor do I feel in any way oblidged to. Even if I’m the emulated me.
but...if you’re the emulated you...you’re going to die after you give him/don’t give him the money.
I can see why that bothers you and it is perfectly reasonable to include a penalty for being temporarily emulated in your utility function.
The way I see it is that I started off as one me, had an extra emulated me for a bit and then was back to me again. There is just as much utility left over as there was at the start, with a bit extra thrown in for the favourable dice game. I have lost nothing.
The emulated me gains nothing to gain be caring whether he is the real me or not. If he had the option of breaking free and being Cam II then by all means he would do so. But as it happens emulated me exists for a finite term and I have no way of exploiting that through my decision making. I just make whatever decision benefits the real me, whether that be this me or the other me.
This is the way I see things: I am an AI running on a installation of Pen Drive Linux, an operating system that runs off a USB stick. Omega tells me that he has purchesed 10,000 identical USB sticks and duplicated me onto each of them. He tells me that in 1 hour he is going to destroy 10,000 of the USB sticks but double the processing power and RAM on the computer that runs me. He plans to repeat the same procedure every day.
I have one hour. Do I use that time to exploit my net connection, appropriate funds and hire an assassain to kill Omega so he ceases his evil mass murder?
As it happens, I don’t. It bothers me not at all which of the 10,001 clones I am. There’s still going to be a me that is more or less the same as me. If it happens that the copy of me currently running is destroyed I am quite willing to accept that. I don’t consider myself ‘killed’ or ‘dead’. I consider that I lost the memory of one conversation with some crazy Omega but gained a bunch of processing power and ram. Whatever. Go ahead, keep at it big O.
In summary: I just don’t think my instinctive aversion do death applies reasonably to situations where clones of me are being created and destroyed all willy nilly. In such situations I measure utility more abstractly.
It’s not just about the USB sticks—to me that seems inert. But if he’s running you off those USB sticks for (let’s say) a few hours every day, then you could (in fact there is a 1000/1001 chance that you will) wake up tomorrow morning and find yourself running from one of those drives, and know that there is a clear horizon of a few hours on the subjective experiences you can anticipate. This is a prospect which I, at least, would find terrifying.
Maybe Omega exists in a higher spatial dimension and just takes an instantaneous snapshot of the universal finite state automata you exist in (as a p-zombie).
I guess I’m a bit tired of “God was unable to make the show today so the part of Omniscient being will be played by Omega” puzzles, even if in my mind Omega looks amusingly like the Flying Spaghetti Monster.
Particularly in this case where Omega is being explicitly dishonest—Omega is claiming to be either be sufficiently omniscient to predict my actions, or insufficiently omniscient to predict the result of a ‘fair’ coin, except that the ‘fair’ coin is explicitly predetermined to always give the same result . . . except . . .
What’s the point of using rationalism to think things through logically if you keep placing yourself into illogical philosophical worlds to test the logic?
Coin is not predetermined, and it doesn’t matter if Omega has hand-selected every result of the coin toss, as long as we don’t have any reason to slide the probability of the result to either direction.
Could be a quantum coin, which is unpredictable under current laws of physics. Anyway, this stuff actually does have applications in decision theory. Quibbling over the practical implementations of the thought experiment is not actually useful to you or anybody else.
More precisely it is exactly predictable but for most practical purposes can be treated as equivalent to an unpredictable coin.
By ‘unpredictable’ I mean ‘under current formalisms of physics it is not possible for us to accumulate enough information to predict it’.
By ‘more precisely’ I mean… no. The way you have phrased it makes your statement false.
You can predict what the future outcome of a quantum coin will be (along the lines of branches with heads and tails their respective amplitudes). A related prediction you cannot make—when the quantum event has already occurred but you have not yet observed it you cannot predict what your observation will be (now that ‘your’ refers only to the ‘you’ in the specific branch).
Again, for practical purposes—for most people’s way of valuing most future outcomes—the future coin can be treated as though it is an unpredictable coin.
I was using ‘you’ and ‘us’ in the colloquial sense of the subjective experiences of a specific, arbitrary continuity chosen at random from of the set of Everette branches in the hypothetical branch of the world tree that this counterfactual occurs in.
Now, I CAN start listing my precise definitions for every potentially ambiguous term I use, or we could simply agree not to pick improbable and inconsistent interpretations of the other’s words. Frankly, I’d much prefer the latter, as I cannot abide pedants.
EDIT: Or you could downvote all my posts. That’s cool too.
Since the distinction is of decision theoretical relevance and the source of much confusion I choose to clarify incorrect usages of ‘unpredictable’ in this particular environment. By phrasing it as ‘more precisely’ I leave plenty of scope for the original speaker to be assumed to be just speaking loosely.
Unfortunately you chose to fortify and defend an incorrect position instead of allowing the additional detail. Now you have given a very nice definition of ‘you’ but even with that definition both of your claims are just as incorrect as when they started. Fixing ‘you’ misses the point.
You are probably too entrenched in your position to work with but for anyone else who wants to talk about ‘unpredictable’ quantum coins, qualifiers like (“for most intents and purposes”, “effectively”) are awesome!
By reading the quantum coin flip, you definitely entangle yourself with it, and there’s no way you’re going to stay coherent.
As a hard-core Everettian, I find the original usage and the followup totally unobjectionable in principle. Your clarification was good except for the part where it said Ati’s statement was wrong. There exists a reading of the terms which leaves those wrong, yes. So don’t use that one.
It should be noted that ‘all my posts’ does not refer to karma-assassaination here. Rather, that three comments here were downvoted. This is correct (and in accord with my downvoting policy).
And I perceived you as being needlessly pedantic and choosing implausible interpretations of my words so that you could correct me. You’ll note that your comment karma stands. I am, in fact, aware of quantum mechanics, and you are, of course, entirely correct. Coins behave in precisely deterministic ways, even if they rely on, say, radioactive decay. The causality just occurs in many Everette branches. That said, there is no way that before ‘you’ ‘flip the coin’ you can make a prediction about its subjective future state, and have more than half of your future selves be right. If that’s not ‘unpredictable’ by the word’s colloquial definition, then I’m not sure the word has any meaning.
You will notice that when I said that the coin is unpredictable, I did not claim, or even imply that the world was undeterministic, or that quantum mechanics was wrong. If I had said such a thing, you would have right to correct me. As it is, you took the opportunity to jump on my phrasing to correct me of a misconception that I did not, in fact, possess. That is being pedantic, it is pointless, and above all it is annoying. I apologize for rudeness, but trying to catch up others on their phrasing is a shocking waste of intellect and time.
EDIT: Again, I can totally discard every word that’s entrenched in good, old-fashioned single-universe connotations, and spell out all the fascinating multiverse implications of everything I say, if that will make you happy—but it will make my posts about five times longer, and it will make it a good deal more difficult to figure out what the hell I’m saying, which rather defeats the purpose of using language.
I’ll note that I reject your ‘implausible’ claim, object to all insinuations regarding motive, stand by my previous statements and will maintain my policy of making mild clarifications when the subject happens to come up.
There seems to be little else to be said here.
As you like. Though I do hope you apply your strident policy of technical correctness in your home life, for consistency’s sake.
For example: someone (clearly wrong) like I would merely say, in our archaic and hoplessly monocosmological phrasing ‘I am going to lunch.’ This is clearly nonsense. You will, over the set of multiverse branches, do a great many things, many of them having nothing to do with food, or survival. The concept of ‘I’ and ‘lunch’ are not even particularly well defined.
In contrast, someone held to your standard of correctness would have to say ‘The computation function implemented in the cluster of mass from which these encoded pressure waves are emanating will execute a series of action for which they predict that in the majority of future Everette branches of this fork of the world tree, the aforementioned cluster of mass will accumulate new amplitude and potential energy through the process of digestion within the next hour and fifteen minutes.’
Clearly this is more efficient and less confusing to the reader.
I consider the selection of analogies made in the parent to constitute a misrepresentation (and fundamental misunderstanding) of the preceding conversation.
I convinced myself to one-box in Newcomb by simply treating it as if the contents of the boxes magically change when I made my decision. Simply draw the decision tree and maximize u-value.
I convinced myself to cooperate in the Prisoner’s Dilemma by treating it as if whatever decision I made the other person would magically make too. Simply draw the decision tree and maximize u-value.
It seems that Omega is different because I actually have the information, where in the others I don’t.
For example, In Newcomb, if we could see the contents of both boxes, then I should two-box, no? In the Prisoner’s Dilemma, if my opponent decides before me and I observe the decision, then I should defect, no?
I suspect that this means that my thought process in Newcomb and the Prisoner’s Dilemma is incorrect. That there is a better way to think about them that makes them more like Omega. Am I correct? Does this make sense?
Yes, the objective in designing this puzzle was to construct an example where according to my understanding of the correct way to make decision, the correct decision looks like losing. In other cases you may say that you close your eyes, pretend that your decision determines the past or other agents’ actions, and just make the decision that gives the best outcome. In this case, you choose the worst outcome. The argument is that on reflection it still looks like the best outcome, and you are given an opportunity to think about what’s the correct perspective from which it’s the best outcome. It binds the state of reality to your subjective perspective, where in many other thought experiments you may dispense with this connection and focus solely on the reality, without paying any special attention to the decision-maker.
In Newcomb, before knowing the box contents, you should one-box. If you know the contents, you should two-box (or am I wrong?)
In Prisoner, before knowing the opponent’s choice, you should cooperate. After knowing the opponent’s choice, you should defect (or am I wrong?).
If I’m right in the above two cases, doesn’t Omega look more like the “after knowing” situations above? If so, then I must be wrong about the above two cases...
I want to be someone who in situation Y does X, but when Y&Z happens, I don’t necessarily want to do X. Here, Z is the extra information that I lost (in Omega), the opponent has chosen (in Prisoner) or that both boxes have money in them (in Newcomb). What am I missing?
No—in the prisoners’ dilemma, you should always defect (presuming the payoff matrix represents utility), unless you can somehow collectively pre-commit to co-operating, or it is iterative. This distinction you’re thinking of only applies when reverse causation comes into play.
I really fail to see why you’re all so fascinated by Newcomb-like problems. When you break causality, all logic based on causality doesn’t function any more. If you try to model it mathematically, you will get inconsistent model always.
There’s no need to break causality. You are a being implemented in chaotic wetware. However, there’s no reason to think we couldn’t have rational agents implemented in much more predictable form, as python routines for example, so that any being with superior computation power could simply inspect the source and determine what the output would be.
In such a case, Newcomb-like problems would arise, perfectly lawfully, under normal physics.
In fact, Newcomb-like problems fall naturally out of any ability to simulate and predict the actions of other agents. Omega as described is essentially the limit as predictive power goes to infinity.
This gives me the intuition that trying to decide whether to one-box or two box on newcomb is like trying to decide what 0^0 is; you get your intuition by following a limit process, but that limit process produces different results depending on the path you take.
It would be interesting to look at finitely good predictors. Perhaps we can find something analogous to the result that lim_(x, y -->0) (x^y) is path dependent.
If we define an imperfect predictor as a perfect predictor plus noise, i.e. produces the correct prediction with probability p regardless of the cognition algorithm it’s trying to predict, then Newcomb-like problems are very robust to imperfect prediction: for any p > .5 there is some payoff ratio great enough to preserve the paradox, and the required ratio goes down as the prediction improves. e.g. if 1-boxing gets 100 utilons and 2-boxing gets 1 utilon, then the predictor only needs to be more than 50.5% accurate. So the limit in that direction favors 1-boxing.
What other direction could there be? If the prediction accuracy depends on the algorithm-to-be-predicted (as it would in the real world), then you could try to be an algorithm that is mispredicted in your favor… but a misprediction in your favor can only occur if you actually 2-box, so it only takes a modicum of accuracy before a 1-boxer who tries to be predictable is better off than a 2-boxer who tries to be unpredictable.
I can’t see any other way for the limit to turn out.
If you have two agents trying to precommit not to be blackmailed by each other / precommit not to pay attention to the others precommitment, then any attempt to take a limit of this Newcomblike problem does depend on how you approach the limit. (I don’t know how to solve this problem.)
The value(s) for which the limit is being taken here is unidirectional predictive power, which is loosely a function of the difference in intelligence between the two agents; intuitively, I think a case could be made that (assuming ideal rationality) the total accuracy of mutual behavior prediction between two agents is conserved in some fashion, that doubling the predictive power of one unavoidably would roughly halve the predictive power of the other. Omega represents an entity with a delta-g so large vs. us that predictive power is essentially completely one-sided.
From that basis, allowing the unidirectional predictive power of both agents to go to infinity is probably inherently ill-defined and there’s no reason to expect the problem to have a solution.
Such a being would be different from a human in fundamental ways. Imagine knowing with certainty that your actions can be predicted perfectly by the guy next door, even taking into account that you are trying to be hard to predict?
A (quasi)rational agent with access to genuine randomness (such as a human) is a different matter. A superintelligence could almost perfectly predict the probability distribution over my actions, but by quantum entanglement it would not be able to predict my actual actions.
Whaddaya mean humans are rational agents with access to genuine randomness? That’s what we’re arguing about in the first place!
Perhaps Omega is entangled with your brain such that in all the worlds in which you would choose to one-box, he would predict that you one-box, and all the worlds in which you would choose to two-box, he would predict that you two-box?
In the original formulation, if Omega expects you to flip a coin, he leaves box B empty.
You wouldn’t know this with certainty* because it wouldn’t be true.
(*unless you were delusional)
The guy next door is on roughly your mental level. Thus, the guy next door can’t predict your actions perfectly, because he can’t run a perfect simulation of your mind that’s faster than you. He doesn’t have the capacity.
And he certainly doesn’t have the capacity to simulate the environment, including other people, while doing so.
Humans may or may not generally have access to genuine randomness.
It’s as yet unknown whether we even have run on quantum randomness; and its also unprovable that quantum randomness is actually genuine randomness, and not just based on effects we don’t yet understand, as so many other types of randomness have been.
You’re not taking this in the least convenient possible world. Surely it’s not impossible in principle that your neighbor can simulate you and your environment. Perhaps your neighbor is superintelligent?
It’s ALSO not impossible in principle in the real world. A superintelligent entity could, in principle, perfectly predict my actions. Remember, in the Least Convenient Possible World quantum “randomness” isn’t random.
As such, this ISN’T a fundamental difference between humans and “such beings”. Which was all I set out to demonstrate.
I was using the “most plausible world” on the basis that it seemed pretty clear that that was the one Roko intended. (Where your neighbour isn’t in fact Yahweh in disguise). EDIT: Probably should specify worlds for things in this kind of environment. Thanks, the critical environment here is helping me think about how I think/argue.
If you believe the Many Worlds Interpretation, then quantum randomness just creates copies in a deterministic way.
You cannot do that without breaking Rice’s theorem. If you assume you can find out the answer from someone else’s source code → instant contradiction.
You cannot work around Rice’s theorem or around causality by specifying 50.5% accuracy independently of modeled system, any accuracy higher than 50%+epsilon is equivalent to indefinitely good accuracy by repeatedly predicting (standard cryptographic result), and 50%+epsilon doesn’t cause the paradox.
Give me one serious math model of Newcomb-like problems where the paradox emerges while preserving causality. Here are some examples. Then you model it, you either get trivial solution to one-box, or causality break, or omega loses.
You decide first what you would do in every situation, omega decides second, and now you only implement your initial decision table and are not allowed to switch. Game theory says you should implement one-boxing.
You decide first what you would do in every situation, omega decides second, and now you are allowed to switch. Game theory says you should precommit to one-box, then implement two-boxing, omega loses.
You decide first what you would do in every situation, omega decides second, and now you are allowed to switch. If omega always decides correctly, then he bases his decision on your switch, which either turns it into model #1 (you cannot really switch, precommitment is binding), or breaks causality.
Rice’s theorem says you can’t predict every possible algorithm in general. Plenty of particular algorithms can be predictable. If you’re running on a classical computer and Omega has a copy of you, you are perfectly predictable.
And all of your choices are just as real as they ever were, see the OB sequence on free will (I think someone referred to it already).
And the argument that omega just needs predictive power of 50.5% to cause the paradox only works if it works against ANY arbitrary algorithm. Having that power against any arbitrary algorithm breaks Rice’s Theorem, having that power (or even 100%) against just limited subset of algorithms doesn’t cause the paradox.
If you take strict decision tree precommitment interpretation, then you fix causality. You decide first, omega decides second, game theory says one-box, problem solved.
Decision tree precommitment is never a problem in game theory, as precommitment of the entire tree commutes with decisions by other agents:
A decides what f(X), f(Y) to do if B does X or Y. B does X. A does f(X)
B does X. A decides what f(X), f(Y) to do if B does X or Y. A does f(X)
are identical, as B cannot decide based on f. So the changing your mind problem never occurs.
With omega:
A decides what f(X), f(Y) to do if B does X or Y. B does X. A does f(X) - B can answer depending on f
B does X. A decides what f(X), f(Y) to do if B does X or Y. A does f(X) - somehow not allowed any more
I don’t think the paradox exist in any plausible mathematization of the problem. It looks to me like another of those philosophical problems that exist because of sloppiness of natural language and very little more, I’m just surprised that OB/LW crowd cares about this one and not about others. OK, I admit I really enjoyed it the first time I saw it but just as something fun, nothing more than that.
I don’t know why nobody mentioned this at the time, but that’s hardly an unpopular view around here (as I’m sure you’ve noticed by now).
The interesting thing about Newcomb had nothing to do with thinking it was a genuine paradox—just counterintuitive for some.
They don’t require breaking causality. The argument works if Omega is barely predicting you above chance. I’m sure there are plenty of normal people who can do that just by talking to you.
There are also more important reasons. Take the doomsday argument. You can use the fact that you’re alive now to predict that we’ll die out “soon”. Suppose you had a choice between saving a life in a third-world country that likely wouldn’t amount to anything, or donating to SIAI to help in the distant future. You know it’s very unlikely for there to be a distant future. It’s like Omega did his coin toss, and if it comes up tails, we die out early and he asks you to waste the money by donating to SIAI. If it comes up heads, you’re in the future, and it’s better if you would have donated.
That’s not some thing that might happen. That’s a decision you have to make before you pick a charity to donate to. Lives are riding on this. That’s if the coin lands on tails. If it lands on heads, there is more life riding on it than has so far existed in the known universe. Please choose carefully.
Arguments like these remind me of students’ mistakes from Algorithms and Data Structures 101 - statements like that are very intuitive, absolutely wrong, and once you figure out why this reasoning doesn’t work it’s easy to forget that most people didn’t go through this ever.
What is required is Omega predicting better than chance in the worst case. Predicting correctly with ridiculously tiny chance of error against “average” person is worthless.
To avoid Omega and causality silliness, and just demonstrate this intuition—let’s take a slightly modified version of Boolean satisfiability—but instead of one formula we have three formulas of the same length. If all three are identical, return true or false depending on its satisfiability, if they’re different return true if number of one bits in problem is odd (or some other trivial property).
It is obviously NP-complete, as any satisfiability problem reduces to it by concatenating it three times. If we use exponential brute force to solve the hard case, average running time is O(n) for scanning the string plus O(2^(n/3)) for brute forcing but only 2^-(2n/3) of the time, that is O(1). So we can solve NP-complete problems in average linear time.
What happened? We were led astray by intuition, and assumed that problems that are difficult in worst case cannot be trivial on average. But this equal weighting is an artifact—if you tried reducing any other NP problem into this, you’d be getting very difficult ones nearly all the time, as if by magic.
Back to Omega—even if Omega predicts normal people very well, as long as there are any thinking being who is cannot predict—Omega must break causality. And such being are not just hypothetical—people who decide based on a coin toss are exactly like that. Silly rules about disallowing chance merely make counterexamples more complicated, Omega and Newcomb are still as much based on sloppy thinking as ever.
I don’t know any reason why a coin toss would be the best choice in Newcomb’s paradox. If you decide based on reason, and don’t decide to flip a coin, and Omega knows you well, he can predict your action above chance. The paradox stands.
Omega cannot know coin flip results without violating causality. So he either puts that million in the box or not. As a result, no matter which way he decides, Omega has 50% chance of violating own rules, which was supposedly impossible, breaking the problem.
What I mean is, if you change the scenario so he only has to predict above chance if you don’t flip a coin, and he isn’t always getting it right anyway, the same basic principle applies, but it doesn’t violate causality.
The obvious extensions of the problem to cases with failable Omega are:
P( $1,000,000) = P(onebox)
Reward = $1,000,000 * P(onebox)
In Bayesian interpretation P() would be Omega’s subjective probability. In frequentist interpretation, the question doesn’t make any sense as you make a single boxing decision, not large number of tiny boxing decisions. Either way P() is very ill-defined.
No more so than other probabilities. Probabilities about future decisions of other actors aren’t disprivileged, that would be free will confusion. And are you seriously claiming that the probabilities of a coin flip don’t make sense in a frequentist interpretation? That was the context. In the general case it would be the long term relative frequency of possible versions of you similar enough to you to be indistinguishable for Omega deciding that way or something like that, if you insisted on using frequentist statistics for some reason.
(this comment assumes “Reward = $1,000,000 * P(onebox)”)
You misunderstand frequentist interpretation—sample size is 1 - you either decide yes or decide no. To generalize from a single decider needs prior reference class (“toin cosses”), getting us into Bayesian subjective interpretations. Frequentists don’t have any concept of “probability of hypothesis” at all, only “probability of data given hypothesis” and the only way to connect them is using priors. “Frequency among possible worlds” is also a Bayesian thing that weirds frequentists out.
Anyway, if Omega has amazing prediction powers, and P() can be deterministically known by looking into the box this is far more valuable than mere $1,000,000! Let’s say I make my decision by randomly generating some string and checking if it’s a valid proof of Riemann hypothesis—if P() is non-zero, I made myself $1,000,000 anyway.
I understand that there’s an obvious technical problem if Omega rounds the number to whole dollars, but that’s just minor detail.
And actually, it is a lot worse in popular problem formulation of “if your decision relies on randomness, there will be no million” that tries to work around coin tossing. In such case a person randomly trying to prove false statement gets a million (as no proof could work, so his decision was reliable), and a person randomly trying to prove true statement gets $0 (as there’s non-zero chance of him randomly generating correct proof).
Another fun idea would be measuring both position and velocity of an electron—tossing a coin to decide either way, measuring one and getting the other from Omega.
Possibilities are just endless.
The issue was whether the formulation makes sense, not whether it makes frequentialists freak out (and it’s not substantially different than e. g. drawing from an urn for the first time). In either case P() was the probablitity of an event, not a hypothesis.
In these sorts of problems you are supposed to assume that the dollar amounts match your actual utilities (as you observe your exploit doesn’t work anyway for tests with a probability of <0.5*10^-9 if rounding to cents, and you could just assume that you already have gained all knowledge you could gain through such test, or that Omega possesses exactly the same knowledge as you except for human psychology, or whatever).
Agreed. This problem seems uninteresting to me too. Though more realistic newcomb-like problems are interesting; for there are parts of life where newcombian reasoning works for real.
On second thoughts, since many clever philosophers spend careers on these problems, I may be missing something.
The obvious complaint about “would you choose X or Y given that Omega already knows your actions” is that it is logically inconsistent; if Omega already knows your actions, the word “choose” is nonsense. Strictly speaking, “choose” is nonsense anyway; it takes the naive free will point of view in its everyday usage.
In order to untangle this, a sophisticated understanding of what we mean by “choose” is needed. I may post on this. My intuition is that if we stick to a rigorous meaning of “choose”, the question will have a well-defined answer that no-one will dispute, however what this answer is will depend on the definition of “choose” that you, um, choose, so to speak…
I find the problem interesting, so I’ll try to explain why I find it interesting.
So there are these blogs called Overcoming Bias and Less Wrong, and the people posting on it seem like very smart people, and they say very reasonable things. They offer to teach how to become rational, in the sense of “winning more often”. I want to win more often too, so I read the blogs.
Now a lot of what these people are saying sounds very reasonable, but it’s also clear that the people saying these things are much smarter than me; so much so that although their conclusions sound very reasonable, I can’t always follow all the arguments or steps used to reach those conclusions. As part of my rationalist training, I try to notice when I can follow the steps to a conclusion, and when I can’t, and remember which conclusions I believe in because I fully understand it, and which conclusions I am “tentatively believing in” because someone smart said it, and I’m just taking their word for it for now.
So now Vladimir Nesov presents this puzzle, and I realize that I must not have understood one of the conclusions (or I did understand them, and the smart people were mistaken), because it sounds like if I were to follow the advice of this blog, I’d be doing something really stupid (depending on how you answered VN’s problem, the stupid thing is either “wasting $100” or “wasting $4950″).
So how do I reconcile this with everything I’ve learned on this blog?
Think of most of the blog as a textbook, with VN’s post being an “exercise to the reader” or a “homework problem”.
The primary reason for resolving Newcomb-like problems is to explore the fundamental limitations of decision theories.
It sounds like you are still confused about free will. See Righting a Wrong Question, Possibility and Could-ness, and Daniel Dennett’s lecture here.
yes, I am confused about free will, but I think that this confusion is legitimate given our current lack of knowledge about how the human mind works.
I hope I’m not making obvious errors about free will. But if I am, then I’d like to know...
I think I’m not confused about free will, and that the links I gave should help to resolve most of the confusion. Maybe you should write a blog post/LW article where you formulate the nature of your confusion (if you still have it after reading the relevant material), I’ll respond to that.
Not really—all that is neccessary is that Omega is a sufficiently accurate predictor that the payoff matrix, taking this accuracy into question, still amounts to a win for the given choice. There is no need to be a perfect predictor. And if an imperfect, 99.999% predictor violates free will, then it’s clearly a lost cause anyway (I can predict with similar precision many behaviours about people based on no more evidence than their behaviour and speech, never mind godlike brain introspection) Do you have no “choice” in deciding to come to work tomorrow, if I predict based on your record that you’re 99.99% reliable? Where is the cut-off that free will gets lost?
Humans are subtle beasts. If you tell me that you have predicted that I will go to work based upon my 99.99% attendance record, the probability that I will go to work drops dramatically upon me receiving that information, because there is a good chance that I’ll not go just to be awkward. This option of “taking your prediction into account, I’ll do the opposite to be awkward” is why it feels like you have free will.
Chances are I can predict such a response too, and so won’t tell you of my prediction (or tell you in such a way that you will be more likely to attend: eg. “I’ve a $50 bet you’ll attend tomorrow. Be there and I’ll split it 50:50”). It doesn’t change the fact that in this particular instance I can fortell the future with a high degree of accuracy. Why then would it violate free will if Omega could predict your accuracy in this different situation (one where he’s also able to predict the effects of him telling you) to a similar precision?
Because that’s pretty much our intuitive definition of free will; that it is not possible for someone to predict your actions, announce it publicly, and still be correct. If you disagree, we are disagreeing about the intuitive definition of “free will” that most people carry around in their heads. At least admit that most people would be unsurprised if a person predicted that they would (e.g.) brush their teeth in the morning (without telling them in advance that it had predicted that), versus predicting that they would knock a vase over, and then as a result of that prediction, the vase actually getting knocked over.
Then take my bet situation. I announce your attendance, and cut you in with a $25 stake in attendance. I don’t think it would be unusual to find someone who would indeed appear 99.99% of the time—does that mean that person has no free will?
People are highly, though not perfectly, predictable under a large number of situations. Revealing knowledge about the prediction complicates things by adding feedback to the system, but there are lots of cases where it still doesn’t change matters much (or even increases predictability). There are obviously some situations where this doesn’t happen, but for Newcombe’s paradox, all that is needed is a predictor for the particular situation described, not any general situation. (In fact Newcombe’s paradox is equally broken by a similar revelation of knowledge. If Omega were to reveal its prediction before the boxes are chosen, a person determined to do the opposite of that prediction opens it up to a simple Epimenides paradox.)
On second thoughts, since many clever philosophers spend careers on these problems, I may be missing something.
Nah, they just need something to talk about.
I’m very torn on this problem. Every time I think I’ve got it figured out and start typing out my reasons why, I change my mind, and throw away my 6+ paragraph explanation and start over, arguing the opposite case, only to change my mind again.
I think the problem has to do with strong conflicts between my rational arguments and my intuition. This problem is a much more interesting koan for me than one hand clapping, or tree in the forest.
I think my answer would be “I would have agreed, had you asked me when the coin chances were .5 and .5. Now that they’re 1 and 0, I have no reason to agree.”
Seriously, why stick with an agreement you never made? Besides, if Omega can predict me this well he knows how the coin will come up and how I’ll react. Why then, should I try to act otherwise. Somehow, I think I just don’t get it.
It doesn’t matter too much but we can assume the Omega doesn’t know how the coin will come up.
That would be rather futile, wouldn’t it? Of course, deciding to give Omega $100 now isn’t trying to change how you would react, it is just choosing your reaction.
This problem seems conceptually identical to Kavka’s toxin puzzle; we have merely replaced intending to drink the poison/pay $100 with being the sort of person whom Omega would predict would do it.
Since, as has been pointed out, one needn’t be a perfect predictor for the game to work, I think I’ll actually try this on some of my friends.
Thanks for reminding of the Kavka’s puzzle. I think that puzzle is unnecessarily mental in its formulation, for example you have to “intend”. It’s less confusing when you work on more technical concepts of decision-making, evidence, preference and precommitment.
I can’t imagine how you are going to perform this on your friends...
The main problem, I think, is getting them to believe that I’m a reliable predictor (i.e. that I predict as well as I claim I do).
Actually, I don’t know that if I do this it will show anything relevant to the problem under consideration. But I think it will show something. It has in fact already shown that I believe that 59% of them would agree to give me the money, either because they are sufficiently similar to Eliezer, or because they enjoy random acts of silliness (and the amount of money involved will be pretty trivial).
Did you do it? And if so, did you give away money to the friends you predicted would have given you money, if the coin came up that way?
How much money did you lose?
No, I never got around to actually doing it I’m afraid.
Whether I give Omega the $100 depends entirely on whether there will be multiple iterations of coin-flipping. If there will be multiple iterations, giving Omega the $100 is indeed winning, just like buying a financial instrument that increases in value is winning.
No, there are no iterations. Omega flies away from your galaxy, right after finishing the transaction. (Added to P.S.)
In that case, I’d hate to disappoint Omega, but there’s no incentive for me to give up my $100. A utility of 0 is better than a negative utility, and if the coin-flip is deterministic, I won’t be serving the interests of my alternate-universe self. Why would I choose otherwise?
Would you prefer to choose otherwise if you considered the deal before the actual coin toss, and arrange the precommitment to that end?
Yes, then, following the utility function you specified, I would gladly risk $100 for an even chance at $10000. Since Omega’s omniscient, I’d be honest about it, too, and cough up the money if I lost.
If it’s rational to do this when Omega asks you in advance, isn’t it also rational to make such a commitment right now? Whether you make the commitment in response to Omega’s notification, or on a whim when considering the thought experiment in response to a blog post makes no difference to the payoff. If you now commit to a “if this exact situation comes up, I will commit to paying the $100 if I lose the coinflip”, and p(x) is the probability of this situation occurring, you will achieve a net gain of $4950*p(x) over a non-committer (a very small number admittedly given that p(x) is tiny, but for the sake of the thought experiment all that matters is that it’s positive.)
Given that someone who makes such a precommitment comes out ahead of someone who doesn’t—shouldn’t you make such a commitment right now? Extend this and make a precommitment to always make the decision to perform the action that would maximise your average returns in all such newcombelike situations and you’re going to come off even better on average.
No, I will not precommit to giving up my $100 for cases where Omega demands the money after the coin flip has occurred. There is no incentive to precommit in those cases, because the outcome is already against me and there’s not a chance that it “would” go in my favour.
At that point, it’s no longer a precommittal—it’s how you face the consequences of your decision whether to precommit or not.
Note that the hypothetical loss case presented in the post is not in fact the decision point—that point is when you first consider the matter, which is exactly what you are doing right now. If you would really change your answer after considering the matter, then having now done so, have you changed it?
If you want to obtain the advantage of someone who makes such a precommittal (and sticks to it), you must be someone who would do so. If you are not such a person (and given your answer, you are not) it is advantageous to change yourself to be such a person, by making that precommitment (or better, a generalised “I will always take the path would have maximised returns across the distribution of counterfactual outcomes in Newcomblike situations”) immediately.
Such commitments change the dynamics of many such thought experiments, but usually they require that that commitment be known to the other person, and enforced some way (The way to win at Chicken is to throw your steering wheel out the window). Here though, Omega’s knowledge of us removes the need to explicit announcement, and it is in our own interests to be self-enforcing (or rather we wish to reliably enforce the decision on our future selves), or we will not receive the benefit. For that reason, a silent decision is as effective as having a conversation with Omega and telling it how we decide.
Explicitly announcing our decision thus only has an effect insofar as it keeps your future self honest. Eg. if you know you wouldn’t keep to a decision idly arrived at, but value your word such that you would stick to doing what you said you would despite its irrationality in that case, then it is currently in your interest to give your word. It’s just as much in your interest to give your word now though—make some public promise that you would keep. Alternatively if you have sufficient mechanisms in your mind to commit to such future irrational behaviour without a formal promise, it becomes unneccessary.
Maybe in thought-experiment-world. But if there’s a significant chance that you’ll misidentify a con man as Omega, then this tendency makes you lose on average.
Sure—all bets are off if you aren’t absolutely sure Omega is trustworthy.
I think this is a large part of the reason why the intuitive answer we jump to is rejection. Being told we believe a being making such extraordinary claims is different to actually believing them (especially when the claims may have unpleasant implications to our beliefs about ourselves), so have a tendency to consider the problem with the implicit doubt we have for everyday interactions lurking in our minds.
Brianm understands reflective consistency!
Right now, yes, I should precommit to pay the $100 in all such situations, since the expected value is p(x)*$4950.
If Omega just walked up to me and asked for $100, and I had never considered this before, the value of this commitment is now p(x)*$4950 - $100, so I would not pay unless I thought there was more than a 2% chance this would happen again.
So after you observe the coin toss, and find yourself in a position where you’ve lost, you’ll give Omega your money? Why would you? It won’t ever reciprocate, and it won’t enforce the deal, its only enforcement are those $10000 that you know got away anyway, because you didn’t win the coin toss.
Yes, I’ll give Omega the money, because if I’m going to refuse to give Omega the money after the coin toss occurs, Omega knows ahead of time on account of his omniscience. If I had won, Omega could look at me and say, “You get no money, because I know you wouldn’t have really given me the $100 if you’d lost. Your pre-commitment wasn’t genuine.”
My answer to this is that integrity is a virtue, and breaking one’s promises reduces one’s integrity. And being a person with integrity is vital to the good life.
Then I repeat the question with MBlume’s corrections, to make the problem less convenient. Would you still follow up and murder 15 people, to preserve your personal integrity? It’s not a question of values, it’s a question of decision theory.
This thread assumes a precommitment. I would not precommit to murder.
I’m not sure what your point is here.
The point is that the distinction between $0.02 and a trillion lives is irrelevant to the discussion, which is about the structure of preference order assigned to actions, whatever your values are. If you are determined to pay off Omega, the reason for that must be in your decision algorithm, not in an exquisite balance between $100, personal integrity, and murder. If you are willing to carry the deal through (note that there isn’t even any deal, only your premeditated decision), the reason for that must lie elsewhere, not in the value of personal integrity.
To make that claim, you do need to first establish that he would accept a bet of 15 lives vs some reward in the first place, which I think is what he is claiming he would not do. There’s a difference between making a bet and reneging, and not accepting the bet. If you would not commit murder to save a million lives in the first place, then the refusal is for a different reason than just the fact that the stakes are raised.
Integrity is a virtue, not a value.
The values aren’t necessarily relevant after I’ve precommitted to the bet, but they’re absolutely relevant to whether I’d precommit to the bet. If murder is one of the options, count me out.
My reason for carrying the deal through is (partially) that it promotes virtue. I do not see any arguments that it cannot be so.
Too vague.
What’s vague? Let me try to spell this out in excruciating detail:
Making good on one’s commitments promotes the virtue of integrity.
Integrity is constitutive of good character.
One cannot consistently act as a person of good character without having it.
To act ethically is to act as a person of good character does.
Ethics specifies what one has most reason to do or want.
So, if you ask me what I have most reason to do in a circumstance where I’ve made a commitment, ceteris paribus, I’ll respond that I’ll make good on my commitments.
I know that this is a very old post, but I thought that I should add a link to The Counterfactual Prisoner’s Dilemma, which is a thought experiment Cousin_it and I independently came up with to demonstrate why you should care about this dilemma.
The setup is as follows:
In this case if you always pay you receive $9900, while if you never pay you receive nothing. Crucially, you perform better regardless of whether the coin comes up heads or tails.
So, is it reasonable to pre-commit to giving the $100 in the counterfactual mugging game? (Pre-commitment is one solution to the Newcomb problem.) On first glance, it seems that a pre-commitment will work.
But now consider “counter-counterfactual mugging”. In this game, Omega meets me and scans my brain. If it finds that I’ve pre-committed to handing over the $s in the counterfactual mugging game, then it empties my bank account. If I haven’t pre-committed to doing anything in counterfactual mugging, then it rewards me with $1 million. Damn.
So what should I pre-commit to doing, if anything? Should I somehow try to assess my likelihood of meeting Omega (in some form or other) and guess what sort of parlour game it is likely to play with me, and for what stakes? Has anyone got any idea how to do that assessment, without unduly privileging the games that we happen to have thought of so far? This way madness lies I fear...
The interest with these Omega games is that we don’t meet actual Omegas, but do meet each other, and the effects are sometimes rather similar. We do like the thought of friends who’ll give us $1000 if we really need it (say in a once-in-a-lifetime emergency, with no likelihood of reciprocity) because they believe we’d do the same for them if they really needed it. We don’t want to call that behaviour irrational. Isn’t that the real point here?
Not exactly madness, but Pascal’s wager. If you haven’t seen any evidence of Omega existing by now, nor any theory behind how predictions such as his could be possible, and word of his parlour game preferences has not reached you, then chances are that he is so unlikely in this universe that he is in the same category as Pascal’s wager.
There is one nice thing about the real-world friend case, which is that you actually might be in the reverse situation later. So it’s not just a counterfactual you’re considering; it’s a real future possibility.
Take that away and it’s more like Omega; but then it’s not the real-world problem anymore!
If I found myself in this kind of scenario then it would imply that I was very wrong about how I reason about anthropics in an ensemble universe (as with Pascal’s mugging or any sort of situation where an agent has enough computing power to take control of that much of my measure such that I find myself in a contrived philosophical experiment). In fact, I would be so surprised to find myself in such a situation that I would question the reasoning that led me to think one boxing was the best course of action in the first place, because somewhere along the way my model became very confused. (I’d still one box, but it would seem less obvious after taking into account the huge amount of previously unexpected structural uncertainty my model of the world suddenly has to deal with.)
I see some reasons for this perspective but I’m not sure.
On the one hand, I don’t know much about the distribution of agent preferences in an ensemble universe. But there may be enough long towers of nested simulations of agents like us to compensate for this.
Normally, you can assume your thought processes are uncorrelated with whats out there. Newcomb-like problems however, do have the state of the outside universe correlated with your actual thoughts, and this is what throws people off.
If you are unsure if the state of the universe is X or Y (say with p = 1⁄2 for simplicity), and we can chose either option A or B, we can calculate the expected utility of choosing A vs B by taking 1⁄2u(A,X)+1/2u(A,Y) and comparing it to 1⁄2u(B,X)+1/2u(B,Y).
In a newcomb-like problem, where the state of the experiment is actually dependent on your choice, the expected utility comparison should now be ~1u(A,X)+~0u(A,Y) vs ~0u(B,X)+~1u(B,Y).
In this case, it boils down to “Is u(A,X) > u(B,Y)?”.
It is not enough for Omega to have a decent record of getting it right, since you could probably do pretty well by reading peoples comments and guessing based on that.
If Omega made its prediction solely based on a comment you made on LessWrong, you should expect that if you choose A the universe will be in the same state as if you choose b- knowing your ultimate decision doesn’t tell you anything, since the only relevant evidence is what you said a month ago.
If, however, Omega actually simulates your thought process in sufficient detail to know for sure which choice you made, knowing that you ultimately decide to pick A is strong evidence that omega has set up X, and if you choose B, you better expect to see Y.
The reason that the answer changes is that the state of the box actually does depend on the thoughts themselves- it’s just that you thought the same thoughts when omega was simulating you before filling the boxes/flipping the coin.
If you aren’t sure whether you’re just Omega’s simulation, you better one box/pay omega. If we’re talking about a wannabe Omega that just makes decent predictions based off comments, then you defect (though if you actually expect a situation like this to come up, you argue that you won’t)
Omega’s actions depend only on your decision (action), or in this case counterfactual decision, not on your thoughts or the algorithm you use to reach the decision. The action of course depends on your thoughts, but that’s the usual case. You may move several steps back, seeking the ultimate cause, but that’s pretty futile.
There is a caveat: if you are an agent who is constructed to live in the world where Omega tossed its coin to come out tails, so that the state space for which your utility function and prior are defined doesn’t contain the areas corresponding to the coin coming up heads, you don’t need to give up $100. You only give up $100 as a tribute to the part of your morality specified on the counterfactual area of the state space.
I would one-box on Newcombe, and I believe I would give the $100 here as well (assuming I believed Omega).
With Newcombe, if I want to win, my optimal strategy is to mimic as closely as possible the type of person Omega would predict would take one box. However, I have no way of knowing what would fool Omega: indeed if it is a sufficiently good predictor there may be no such way. Clearly then the way to be “as close as possible” to a one-boxer is to be a one-boxer. A person seeking to optimise their returns will be a person who wants their response to such stimulus to be “take one box”. I do want to win, so I do want my response to be that, so it is: I’m capable of locking my decisions (making promises) in ways that forgo short-term gain for longer term benefit.
The situation here is the same, even though I have already lost. It is beneficial for me to be that type of person in general (obscured by the fact that the situation is so unlikely to occur). Were I not the type of person who made the decision to pay out on loss, I would be the type of person that lost $10000 in an equally unlikely circumstance. Locking that response in now as a general response to such occurrances means I’m more likely to benefit than those who don’t.
Well, the other way to look at it is “What action leads me to win?” in the Newcomb problem, one-boxing wins, so you and I are in agreement there.
But in this problem, not-giving-away-$100 wins. Sure, I want to be the “type of person who one boxes”, but why do I want to be that person? Because I want to win. Being that type of person in this problem actually makes you lose.
The problem states that this is a one-shot bet, and that after you do or don’t give Omega the $100, he flies away from this galaxy and will never interact with you again. So why give him the $100? It won’t make you win in the long term.
Yes, but Omega isn’t really here yet, and you, Nebu, deciding right now that you will give him $100 does make you win, since it gives you a shot at $10000.
Right, so if a normal person offered me the bet (and assuming I could somehow know it was a fair coin) then yes, I would accept the bet.
If it was Omega instead of a normal person offering the bet, we run into some problems...
But if Omega doesn’t actually offer the bet, and just does what is described by Vladimir Nesov, then I wouldn’t give him the $100. [1]
In other words, I do different things in different situations.
Edit 1: (Or maybe I would. I haven’t figured it out yet.)
The problem only asks about what you would do in the failure case, and I think this obscures the fact that the relevant decision point is right now. If you would refuse to pay, that means that you are the type of person who would not have won had the coin flip turned out differently, either because you haven’t considered the matter (and luckily turn out to be in the situation where your choice worked out better), or because you would renege on such a commitment when it occurred in reality.
However at this point, the coin flip hasn’t been made. The globally optimal person to be right now is one that does precommit and doesn’t renege. This person will come out behind in the hypothetical case as it requires we lock ourselves into the bad choice for that situation, but by being a person who would act “irrationally” at that point, they will outperform a non-committer/reneger on average.
What if there is no “on average”, if the choice to give away the $100 is the only choice you are given in your life? There is no value in being the kind of person who globally optimizes because of the expectation to win on average. You only make this choice because it’s what you are, not because you expect the reality on average to be the way you want it to be.
From my perspective now, I expect the reality to be the winning case 50% of the time because we are told this as part of the question: Omega is trustworthy and said it tossed a fair coin. In the possible futures where such an event could happen, 50% of the time my strategy would have paid off to a greater degree than it would lose the other 50% of the time. If omega did not toss a fair coin, then the situation is different, and my choice would be too.
There is no value in being such a person if they happen to lose, but that’s like saying there’s no value in being a person who avoids bets that lose on average by only posing the 1 in several million time they would have won the lottery. On average they’ll come out ahead, just not in the specific situation that was described.
I’m way late to this party, but aren’t we ignoring something obvious? Such as imperfect knowledge of how likely Omega is to be right about its prediction of what you would do? If you live in a universe where Omega is a known fact and nobody thinks themselves insane when they meet him, well, then it’s the degenerate case where you are 100% certain that Omega predicts correctly. If you lived in such a universe presumably you would know it, and everyone in that world would pre-commit to giving Omega $100, just like in ours pizza-deliverers pre-commit to not carrying more than a small amount of cash with them.
There may be other universes where Omega is known to be right and do what he says he will do 80% of the time. Or ones where there are rumors of an omniscient Omega that always makes good on his word, but you assign them 80% probability of being true. And so on.
Given the $5000 expected payoff and the $50 expected cost for pre committing, you should do it if the probability of Omega being both right and trustworthy is greater than or equal to 0.01.
But, if you, knowing what you know about THIS universe, suddenly found yourself in the presence of some alien entity making the claim Omega makes in the above scenario, what kind of evidence would you demand for this claim before assigning a probability greater than 0.01?
It occurs to me that the dude in the robe and mask pretending to be Omega could up the ante to $1000000, and if I wouldn’t believe him more than 0.01% given a $10000 payoff, it probably wouldn’t matter to me what he offered as a payoff, because if he has enough delusions and/or chutzpah to make this claim in this universe, there’s no reason for him to balk at adding on a few extra decimal places. I’m not sure how to formalize that mathematically, though.
Under my syntacticist cosmology, which is a kind of Tegmarkian/Almondian crossover (with measure flowing along the seemingly ‘backward’ causal relations), the answer becomes trivially “yes, give Omega the $100” because counterfactual-me exists. In fact, since this-Omega simulates counterfactual-me and counterfactual-Omega simulates this-me, the (backwards) flow of measure ensures that the subjective probabilities of finding myself in real-me and counterfactual-me must be fairly close together; consequently this remains my decision even in the Almondian variety. The purer and more elegant version of syntacticism doesn’t place a measure on the Tegmark-space at all, but that makes it difficult to explain the regularity of our universe—without a probability distribution on Tegmark-space, you can’t even mathematically approach anthropics. However, in that version counterfactual-me ‘exists to the same extent that I do’, and so again the answer is trivially “give Omega the $100″.
Counterfactual problems can be solved in general by taking one’s utilitarian summation over all of syntax-space rather than merely one’s own Universe/hubble bubble/Everett branch. The outstanding problem is whether syntax-space should have a measure and if so what its nature is (and whether this measure can be computed).
Does syntacticism work if you know Omega likes simulating poor you, and each simulated rich you is counterbalanced by many simulated poor yous? Or only in special cases like you mentioned?
Yes, it still works, because of the way the subjective probability flow on Tegmark-space works. (Think of it like PageRank, and remember that the s.p. flows from the simulated to the simulator)
It is technically possible that the differences between how much the two Universes simulate each other can, when combined with differences in how much they are simulated by other Universes, can cause the coupling between the two not to be strong enough to override some other couplings, with the result that the s.p. expectation of “giving Omega the $100” is negative. However, under my current state of logical uncertainty about the couplings, that outcome is rather unlikely, so taking a further expectation over my guesses of how likely various couplings are, the deal is still a good one.
Actually, in my own thinking I no longer call it “Tegmark-space”, instead I call it the “Causality Manifold” and I’m working on trying to find a formal mathematical expression of how causal loop unfolding can work in a continuous context. Also, I’m no longer worried about the “purer and more elegant version” of syntacticism, because today I worked out how to explain the subjective favouring of regular universes (over irregular ones, which are much more numerous). One thing that does worry me, though, is that every possible Causality Manifold is also an element of the CM, which means either stupidly large cardinal axioms or some kind of variant of the “No Gödels” argument from Syntacticism (the article).
This is just the one-shot Prisoner’s Dilemma. You being split into two different possible worlds, is just like the two prisoners being taken into two different cells.
Therefore, you should give Omega $100 if and only if you would cooperate in the one-shot PD.
It is not at all the PD, for a variety of reasons, the main one being that only one of you has to make a non-obvious choice.
I don’t see the difficulty. No, you don’t win by giving Omega $100. Yes, it would have been a winning bet before the flip if, as you specify, the coin is fair. Your PS, in which you say to “assume that in the overwhelming measure of the MWI worlds it gives the same outcome”, contradicts the assertion that the coin is fair, and so you have asked us for an answer to an incoherent question.
This doesn’t sound right to me. The coin doesn’t need to be quantum mechanical to be fair. Here is a fair but perfectly deterministic coin: the 1098374928th digit of pi, mod 2. I have no idea whether it’s a zero or one. I could figure it out if you gave me enough time, as could Omega. If both of us agree not to take the time to figure it out in advance, we can use it as a fair coin. But in all Everett branches, it comes out the same way.
The difficulty comes from projecting the ideal decision theory on people. Look how many people are ready to pay up $100, so it must be a real difficulty.
The fairness of a coin is a property of your mind, not of the coin itself. The coin can be fair in a deterministic world, the same way you can have free will in deterministic world.
Better to say that your state of knowledge about the coin, prior to Omega appearing, is that it has a probability 1⁄2 of being heads and 1⁄2 of being tails. The MWI clause is supposed to make the problem harder by preventing you from assigning utility (once Omega appears) to your ‘other selves’ in other Everett branches. The problem is then just: “how, knowing that Omega might appear, but not knowing what the coin flip will be, can I maximise my utility?” If Omega appears in front of you right now then that’s a different question.
My state of knowledge about the coin prior to Omega appearing is that I don’t even know that the coin is going to be flipped, actually.
No, it’s a clear loss.
The only winning scenario is, “the coin comes down heads and you have an effective commitment to have paid if it came down tails.”
By making a binding precommitment, you effectively gamble that the coin will come down heads. If it comes down tails instead, clearly you have lost the gamble. Giving the $100 when you didn’t even make the precommitment would just be pointlessly giving away money.
I realise I’m coming to this a little late, but I’m a little unclear about this case. This is my understanding:
When you ask me if I should give Omega the $100, I commit to “yes” because I am the agent who might meet Omega one day, and since I am in fact at the time before the coin has been flipped right now, by the usual expected value calculations the rational choice is to decide to.
So does that mean that if I commit now (eg: by giving myself a monetary incentive to give the $100), and my friend John meets Omega tomorrow who has flipped the coin and it has landed tails, I should tell him that the rational choice is to not give the $100, since he is deciding after the coin toss.
Would anyone be so kind as to tell me if that seems right?
Well, this comes up different ways under different interpretations. If there is a chance that I am being simulated, that is this is part of his determining my choice, then I give him $100. If the coin is quantum, that is there will exist other mes getting the money, I give him $100. If there is a chance that I will encounter similar situations again, I give him $100. If I were informed of the deal beforehand, I give him $100. Given that I am not simulated, given that the coin is deterministic, and given that I will never again encounter Omega, I don’t think I give him $100. Seeing as I can treat this entirely in isolation due to these conditions, I have the choice between -$100 and $0, of which two options the second is better. Now, this runs into some problems. If I were informed of it beforehand, I should have precommitted. Seeing as my choices given all information shouldn’t change, this presents difficulty. However, due to the uniqueness of this deal, there really does seem to be no benefit to any mes from giving him the money, and so it is purely a loss.
No precommittment, no deal.
Suppose Omega gives you the same choice, but says that if a head had come up, it would have killed you, but only if you {would have refused|will refuse} to give it your lousy $100 {if the coin had come up heads|given that the coin has come up heads}. Not sure what the correct tense is, here.
I believe that I would keep the $100 in your problem, but give it up in mine.
ETA: Can you clarify your postscript? Presumably you don’t want the knowledge about the distribution of coin-flip states across future Everett branches to be available for the purposes of the expected utility calculation?
I’m trying to set up a sufficiently inconvenient possible world by introducing additional assumptions. The one about MWI stops the excuse of there being other real you in the other MWI branches who do receive the $10000. Not allowed.
How do you pick the threshold, decide that [$10000] < [decision threshold] < [your life]?
You’ve actually made it an easier problem for me, though, because I regard my alternate selves as other people.
How do you peak the threshold, decide that [$10000] < [decision threshold] < [your life]?
If it were possible for me to make a deal with my alternate self by which I get a few thousand dollars, I would obviously surrender my $100. As it isn’t possible, I see little reason to give someone otherwise destined to be forever causally isolated from me $10000 at the cost of $100. I wouldn’t keep $100 if it meant he lost $10000, either. I probably would keep the $100 if they lost less than $100. If my alternate self stood to gain, say, a million dollars, but nothing if I kept my $100, then I probably would give it up. But that would be as a whimsy, something to think about and feel good. But the benefit to me of that whimsy would have to be worth more than $100.
The pattern behind my choices is that the pain experienced by my alternate self (who, recall, I consider a different person) in any of these cases is never more than $100. I think this is the most we can expect, on average, of other intelligent beings: that they will not inflict a large loss for a small gain. Why not steal, in that case? Because there is, in fact, no such thing as total future causal isolation.
There is no alternative self. None at all. The alternative may be impossible according to the laws of physics. It is only present in your imperfect model of the world. You can’t trade with a fiction, and you shouldn’t emphasize with a fiction. What you decide, you decide in this our real world. You decide that it is right to make a sacrifice, according to your preferences that only live in your model of the world, but speak about the reality.
I think that this is a critical point, worthy of a blog post of its own. Impossible possible worlds are a confusion.
The inclination to trade with fiction seems like a serious problem within this community.
I’ve misunderstood you to an extent, then.
My preferences don’t involve me sacrificing unless someone can get hurt. It doesn’t matter whether that person exists in another Everett branch, within Omega or in another part of the Tegmark ensemble, but there must be a someone. I’ll play symmetrist with everyone else (which is, in a nutshell, what I said in my comment above) but not with myself. You seem to want a person that is me, but minus the “existence” property. I don’t think that is a coherent concept.
OK, suppose that Omega came along right now and said to me “I have determined that if you could be persuaded that your actions would have no consequence, and then given the problem you are currently discussing, you would in every case keep $100. Therefore I will torture you endlessly.” I would not see this as proof of my irrationality (in the sense of hopelessly failing to achieve my preferences). I don’t think that such a sequence of events is germane to the problem as you see it, but I also don’t see how it is not germane.
How much do you know about many worlds, anyways? My alternate self very much does exist, the technical term is possibility-cloud which will eventually diverge noticeably but which for now is just barely distinguishable from me.
there you go.
Vladimir_Nesov!2009 knew more than enough about Many Worlds to know how to exclude it as a consideration. Vladimir_Nesov!2013 probably hasn’t forgotten.
No. It doesn’t exist. Not all uncertainty represents knowledge about quantum events which will have significant macroscopic relevance. Some represents mere ignorance. This ignorance can be about events that are close to deterministic—that means the ‘alternate selves’ have negligible measure and even less decision theoretic relevance. Other uncertainty represents logical uncertainty. That is, where the alternate selves don’t even exist in the trivial irrelevant sense. It was just that the participant didn’t know that “2+2=4” yet.
There may be fewer of those than you realize.
Given that many-worlds is true, yes. Invoking it kind of defeats the purpose of the decision theory problem though, as it is meant as a test of reflective consistency (i.e. you are supposed to assume you prefer $100>$0 in this world regardless of any other worlds).
Ok so there’s a good chance I’m just being an idiot here, but I feel like a multiple worlds kind of interpretation serves well here. If, as you say, “the coin is deterministic, [and] in the overwhelming measure of the MWI worlds it gives the same outcome,” then I don’t believe the coin is fair. And if the coin isn’t fair, then of course I’m not giving Omega any money. If, on the other hand, the coin is fair, and so I have reason to believe that in roughly half of the worlds the coin landed on the other side and Omega posed the opposite question, then by giving Omega the $100 I’m giving the me in those other worlds $1000 and I’m perfectly happy to do that.
Not sure how to delete, but this was meant to be a reply.
I think that what really does my head in about this problem is, although I may right now be motivated to make a commitment, because of the hope of winning the 10K, nonetheless my commitment cannot rely on that motivation, because when it comes to the crunch, that possibility has evaporated and the associated motivation is gone. I can only make an effective commitment if I have something more persistent—like the suggested $1000 contract with a third party. Without that, I cannot trust my future self to follow through, because the reasons that I would currently like it to follow through will no longer apply.
MBlume stated that if you want to be known as the sort of person who’ll do X given Y, then when Y turns up, you’d better do X. That’s a good principle—but it too can’t apply, unless at the point of being presented with the request for $100, you still care about being known as that sort of person—in other words, you expect a later repetition of the scenario in some form or another. This applies as well to Eliezer’s reasoning about how to design a self-modifying decision agent—which will have to make many future decisions of the same kind.
Just wanting the 10K isn’t enough to make an effective precommitment. You need some motivation that will persist in the face of no longer having the possibility of the 10K.
It seems to me the answer becomes more obvious when you stop imagining the counterfactual you who would have won the $10000, and start imagining the 50% of superpositions of you who are currently winning the $10000 in their respective worlds.
Every implementation of you is you, and half of them are winning $10000 as the other half lose $100. Take one for the team.
Sorry, but I’m not in the habit of taking one for the quantum superteam. And I don’t think that it really helps to solve the problem; it just means that you don’t necessarily care so much about winning any more. Not exactly the point.
Plus we are explicitly told that the coin is deterministic and comes down tails in the majority of worlds.
If you’re not willing to “take one for the team” of superyous, I’m not sure you understand the implications of “every implementation of you is you.”
It does solve the problem, though, because it’s a consistent way to formalize the decision so that on average for things like this you are winning.
I think you’re missing the point here. Winning in this case is doing the thing that on average nets you the most success for problems of this class, one single instance of it notwithstanding.
And this explains why you’re missing the point. We are told no such thing. We are told it’s a fair coin and that can only mean that if you divide up worlds by their probability density, you win in half of them. This is defined.
What seems to be confusing you is that you’re told “in this particular problem, for the sake of argument, assume you’re in one of the worlds where you lose.” It states nothing about those worlds being over represented.
No, take another look:
Does this particular thought experiment really have any practical application?
I can think of plenty of similar scenarios that are genuinely useful and worth considering, but all of them can be expressed with much simpler and more intuitive scenarios—eg when the offer will/might be repeated, or when you get to choose in advance whether to flip the coin and win 10000/lose 100. But with the scenario as stated—what real phenomenon is there that would reward you for being willing to counterfactually take an otherwise-detrimental action for no reason other than qualifying for the counterfactual reward? Even if we decide the best course of action in this contrived scenario—therefore what?
Precommitments are used in decision-theoretic problems. Some people have proposed that a good decision theory should take the action that it would have precommitted to, if it had known in advance to do such a thing. This is an attempt to examine the consequences of that.
Yes, but if the artificial scenario doesn’t reflect anything in the real world, then even if we get the right answer, therefore what? It’s like being vaccinated against a fictitious disease; even if you successfully develop the antibodies, what good do they do?
It seems to me that the “beggars and gods” variant mentioned earlier in the comments, where the opportunity repeats itself each day, is actually a more useful study. Sure, it’s much more intuitive; it doesn’t tie our brains up in knots, trying to work out a way to intend to do something at a point when all our motivation to do so has evaporated. But reality doesn’t have to be complicated. Sometimes you just have to learn to throw in the pebble.
Decision theory is an attempt to formalize the human decision process. The point isn’t that we really are unsure whether you should leave people to die of thirst, but how we can encode that in an actual decision theory. Like so many discussions on Less Wrong, this implicitly comes back to AI design: an AI needs a decision theory, and that decision theory needs to not have major failure modes, or at least the failure modes should be well-understood.
If your AI somehow assigns a nonzero probability to “I will face a massive penalty unless I do this really weird action”, that ideally shouldn’t derail its entire decision process.
The beggars-and-gods formulation is the same problem. “Omega” is just a handy abstraction for “don’t focus on how you got into this decision-theoretic situation”. Admittedly, this abstraction sometimes obscures the issue.
I don’t think so; I think the element of repetition substantially alters it—but in a good way, one that makes it more useful in designing a real-world agent. Because in reality, we want to design decision theories that will solve problems multiple times.
At the point of meeting a beggar, although my prospects of obtaining a gold coin this time around are gone, nonetheless my overall commitment is not meaningless. I can still think, “I want to be the kind of person who gives pennies to beggars, because overall I will come out ahead”, and this thought remains applicable. I know that I can average out my losses with greater wins, and so I still want to stick to the algorithm.
In the single-shot scenario, however, my commitment becomes worthless once the coin comes down tails. There will never be any more 10K; there is no motivation any more to give 100. Following my precommitment, unless it is externally enforced, no longer makes any sense.
So the scenarios are significantly different.
This is the point of the thought experiment.
Omega is a predictor. His actions aren’t just based on what you decide, but on what he predicts that you will decide.
If your decision theory says “nah, I’m not paying you” when you aren’t given advance warning or repeated trials, then that is a fact about your decision theory even before Omega flips his coin. He flips his coin, gets heads, examines your decision theory, and gives you no money.
But if your decision theory pays up, then if he flips tails, you pay $100 for no possible benefit.
Neither of these seems entirely satisfactory. Is this a reasonable feature for a decision theory to have? Or is it pathological? If it’s pathological, how do we fix it without creating other pathologies?
But in the single-shot scenario, after it comes down tails, what motivation does an ideal game theorist have to stick to the decision theory?
Like Parfit’s hitchhiker, although in advance you might agree that it’s a worthwhile deal, when it comes to the point of actually paying up, your motivation is gone, unless you have bound yourself in some other way.
That’s what the problem is asking!
This is a decision-theoretical problem. Nobody cares about it for immediate practical purpose. “Stick to your decision theory, except when you non-rigorously decide not to” isn’t a resolution to the problem, any more than “ignore the calculations since they’re wrong” was a resolution to the ultraviolet catastrophe.
Again, the point of this experiment is that we want a rigorous, formal explanation of exactly how, when, and why you should or should not stick to your precommitment. The original motivation is almost certainly in the context of AI design, where you don’t HAVE a human homunculus implementing a decision theory, the agent just is its decision theory.
Well, if we’re designing an AI now, then we have the capability to make a binding precommitment, simply by writing code. And we are still in a position where we can hope for the coin to come down heads. So yes, in that privileged position, we should bind the AI to pay up.
However, to the question as stated, “is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?” I would still answer, “No, you don’t achieve your goals/utility by paying up.” We’re specifically told that the coin has already been flipped. Losing $100 has negative utility, and positive utility isn’t on the table.
Alternatively, since it’s asking specifically about the decision, I would answer, If you haven’t made the decision until after the coin comes down tails, then paying is the wrong decision. Only if you’re deciding in advance (when you still hope for heads) can a decision to pay have the best expected value.
Even if deciding in advance, though, it’s still not a guaranteed win, but rather a gamble. So I don’t see any inconsistency in saying, on the one hand, “You should make a binding precommitment to pay”, and on the other hand, “If the coin has already come down tails without a precommitment, you shouldn’t pay.”
If there were a lottery where the expected value of a ticket was actually positive, and someone comes to you offering to sell you their ticket (at cost price), then it would make sense in advance to buy it, but if you didn’t, and then the winners were announced and that ticket didn’t win, then buying it no longer makes sense.
You’re fundamentally failing to address the problem.
For one, your examples just plain omit the “Omega is a predictor” part, which is key to the situation. Since Omega is a predictor, there is no distinction between making the decision ahead of time or not.
For another, unless you can prove that your proposed alternative doesn’t have pathologies just as bad as the Counterfactual Mugging, you’re at best back to square one.
It’s very easy to say “look, just don’t do the pathological thing”. It’s very hard to formalize that into an actual decision theory, without creating new pathologies. I feel obnoxious to keep repeating this, but that is the entire problem in the first place.
Except that even if you make the decision, what would motivate you to stick to it once it can no longer pay up?
Your only motivation to pay is the hope of obtaining the $10000. If that hope does not exist, what reason would you have to abide by the decision that you make now?
Your decision is a result of your decision theory, and your decision theory is a fact about you, not just something that happens in that moment.
You can say—I’m not making the decision ahead of time, I’m waiting until after I see that Omega has flipped tails. In which case, when Omega predicts your behavior ahead of time, he predicts that you won’t decide until after the coin flip, resulting in hypothetically refusing to pay given tails, so—although the coin flip hasn’t happened yet and could still come up heads—your yet-unmade decision has the same effect as if you had loudly precommitted to it.
You’re trying to reason in temporal order, but that doesn’t work in the presence of predictors.
I get that that could work for a computer, because a computer can be bound by an overall decision theory without attempting to think about whether that decision theory still makes sense in the current situation.
I don’t mind predictors in eg Newcomb’s problem. Effectively, there is a backward causal arrow, because whatever you choose causes the predictor to have already acted differently. Unusual, but reasonable.
However, in this case, yes, your choice affects the predictor’s earlier decision—but since the coin never came down heads, who cares any more how the predictor would have acted? Why care about being the kind of person who will pay the counterfactual mugger, if there will never again be any opportunity for it to pay off?
Yes, that is the problem in question!
If you want the payoff, you have to be the kind of person who will pay the counterfactual mugger, even once you no longer can benefit from doing so. Is that a reasonable feature for a decision theory to have? It’s not clear that it is; it seems strange to pay out, even though the expected value of becoming that kind of person is clearly positive before you see the coin. That’s what the counterfactual mugging is about.
If you’re asking “why care” rhetorically, and you believe the answer is “you shouldn’t be that kind of person”, then your decision theory prefers lower expected values, which is also pathological. How do you resolve that tension? This is, once again, literally the entire problem.
Well, as previously stated, my view is that the scenario as stated (single-shot with no precommitment) is not the most helpful hypothetical for designing a decision theory. An iterated version would actually be more relevant, since we want to design an AI that can make more than one decision. And in the iterated version, the tension is largely resolved, because there is a clear motivation to stick with the decision: we still hope for the next coin to come down heads.
Are you actually trying to understand? At some point you’ll predictably approach death, and predictably assign a vanishing probability to another offer or coin-flip coming after a certain point. Your present self should know this. Omega knows it by assumption.
I’m pretty sure that decision theories are not designed on that basis. We don’t want an AI to start making different decisions based on the probability of an upcoming decommission. We don’t want it to become nihilistic and stop making decisions because it predicted the heat death of the universe and decided that all paths have zero value. If death is actually tied to the decision in some way, then sure, take that into account, but otherwise, I don’t think a decision theory should have “death is inevitably coming for us all” as a factor.
You are wrong. In fact, this is a totally standard thing to consider, and “avoid back-chaining defection in games of fixed length” is a known problem, with various known strategies.
So say it’s repeated. Since our observable universe will end someday, there will come a time when the probability of future flips is too low to justify paying if the coin lands tails. Your argument suggests you won’t pay, and by assumption Omega knows you won’t pay. But then on the previous trial you have no incentive to pay, since you can’t fool Omega about your future behavior. This makes it seem like non-payment propagates backward, and you miss out on the whole sequence.
I wouldn’t trust myself to accurately predict the odds of another repetition, so I don’t think it would unravel for me. But this comes back to my earlier point that you really need some external motivation, some precommitment, because “I want the 10K” loses its power as soon as the coin comes down tails.
The only mechanisms I know of by which Omega can accurately predict me without introducing paradoxes is by running something like a simulation, as others have suggested. But I really, truly, only care about the universe I happen to know about, and for the life of me, I can’t figure out why I should care about any other. So even if the universe I perceive really is just simulated so that Omega can figure out what I would do in this situation, I don’t understand why I should care about “my” utility in some other universe. So, two box, keep my $100.
Edit: I should add that my not caring about other universes is conditional on my having no reason to believe they exist.
Ah. But under mild assumptions about how Omega’s simulation works, I can expect that with some probability p bounded away from zero, I am in a simulation. So with probability at least p, there is another universe I care about, and I can increase utility there.
So, I guess I do pay $100, but only because my utility function values the utility of others. I remain unconvinced that paying is winning for someone with a different utility function.
I have one minor question about this problem, would I be allowed to say, offer omega $50 instead of the $100 he asked for in exchange for $5000 and the promise that, if it had occured that the coin landed head, it would give me $5000 and ask me for $50, which he (going to refer to all sentinents as he, that way I don’t have to waste time typing figuring out whether the person I’m talking about is he,she, or it.) would know to do since Omega would simulate the me when the tail landed tails, and thus the simulated me would offer him this proposition. Which should not be too difficult to accept, given that the cost to Omega is basically zero across all possibilities, unless part of the point of the exercise was to mess with me.
In the event that he rejects this offer, I’m going to give him $100, and then mug him. (Assuming of course that the probability is not 0 that I succeed in the mugging, if he were to say kill me in response to the mugging, I’d simply have the copy that succeeded in mugging him force him to use some of his powers to resurrect my less fortunate copies. Assuming of course that that power is part of his omnipotence, else I wouldn’t mug him, no point in betting my life in something that does not generate more of my life (given that if I succeeded I can have him make all copies of me immortal. Of course, if he had this power, there’s a might be a chance that his retaliation might have been to simply wipe out all copies of me in all existances, in that case, the probability of success should be computed as a negative value, given I CAN fail more times then I try.) In the case I succeed in the mugging, I’d get at least my $100 back, and the cases I fail, I doubt I’d care about $100.
In the case that neither of the above are possible, I would not give him the $100, given that the diminishing returns of increasing amounts of money might well make the $10000 less utility than 2x instances of $100. (The 2x instances of $100 scale linearly, where each increasing $100 in the $10000 diminishes in value. As in, each instance of $100 would be worth just as much as a prior instance of $100 since it’s being distributed among different copies of me, so diminishing returns does not kick in, whereas the $10000 all goes to one instance. It should be obvious that I prefer $50 to 50% chance of $100)
Of course, due to the above, there’s a fourth possiblity, one where the iteration of me being offered the choice is being very much affected by the diminishing returns on the value of money. In that case, I would give $100 to omega, since this action would partially smooth out the differing amounts of wealth among multiple copies of me across worlds. Or rather, diminish the number of me who are “poorer,” since the copies that are in need of money do not give up $100, but will recieve some regardless, unless that doesn’t work out because Omega simulates the exact version, including current finiancial assets, which rather nullifies his capabilities as an interdimentional arbitrageur among copies of me. But at that point, the diminishing returns on money should be such that each additional $100 should be roughly equal in value, since diminishing returns ALSO suffer from diminishing returns, with increasing amounts of diminishing returns diminishing less returns.
In short, the options are, I offer him $50 for a constant $5000 across all outcomes of the coin flip, and he accepts, I give him $100 then I mug him, I do not give him $100 if diminishing returns is not yet itself affected by much by diminishing returns, or I give him $100 if it is.
We have to presume you can’t just mug Omega. (He is omniscient, may as well make him omnipotent too.) Otherwise the problem is totally different.
Given what you can do with omniscience that’s not much of a stretch!
I am unable to see how this boils down to anything but a moral problem (and therefore with no objective solution).
Compare this to a simple lost bet. Omega tells you about the deal, you agree, and then he flips a coin, which comes out tails. Why exactly would you pay the $100 in this example?
Because someone will punish/ostracise me if I renege (or other external consequences)? Then in the CM case all that matters is what the consequences are for your payment/refusal.
Because I have an absolute/irrational/moral desire to hold to my word? Then the only question is whether your definition of “my word” (or, more generally, your self-imposed moral obligations) includes counterfactual promises. But this is only a matter of choosing the boundaries of your arbitrary moral guidelines. It is hardly more solvable or more interesting than asking if you would consider yourself morally beholden to a promise you made when you were four years old.
Uniqueness raises all sorts of problems for decision theory, because expected utility implicitly assumes many trials. This may just be another example of that general phenomenon.
How do I know that? I would assign a lower prior probability to that than to me waking up tomorrow with a blue tentacle instead of my right arm; so, it such a situation, I would just believe Omega is bullshitting me.
See Least convenient possible world. These technical difficulties are irrelevant to the problem itself.
It does seem like a legitimate issue though, that a decision theory that deals with the least convenient possible world manifestation of the Counterfactual Mugging scenario is not necessarily well adapted in general.
When to believe what claims is a completely separate issue. We are looking at a thought experiment to get a better idea about what kinds of considerations should be taken into account in general, not to build a particular agent that does well in this situation (and possibly worse in others).
Is the scenario really isomorphic to any sort of real life dilemma though? An agent which commits to paying out the $100 could end up being screwed over by an anti-Omega, which would pay out $10,000 only to a person who wouldn’t give Omega the $100. I’m not clear on what sort of general principles the thought experiment is supposed to illustrate.
Start from assuming that the agent justifiably knows that the thought experiment is set up as it’s described.
Do they know before being confronted by Omega, or only once confronted?
If they did not know in advance that it’s more likely for Omega to appear and conduct the counterfactual mugging than it is for anti-Omega to appear and reward those who wouldn’t cooperate on the counterfactual mugging, then I can’t see that there’s any point in time where the agent should expect greater utility by committing to cooperate on the counterfactual mugging. If they do know in advance, then it’s better to precommit.
It’s an assumption of the thought experiment that the player justifiably learns about the situation after the coin is tossed, that they are dealing with Omega and not “anti-Omega” and somehow learn that to be the case.
In that case, it doesn’t seem like there’s any point in time where a decision to cooperate should have a positive expected utility.
Correctness of decisions doesn’t depend on current time or current knowledge.
Precommitting should be, as someone already said, signing a paper with a third party agreeing to give them $1000 in case you fail to give the $100 to Omega. Precommitment means you have no other option. You can’t say that you both precommitted to give the $100 AND refused to do it when presented with the case.
Which means, if Omega presents you with the scenario before the coin toss, you precommit (by signing the contract with the third party). If Omega presents you with the scenario after the coin toss AND also tells you it has already come up tails—you haven’t precommited, therefore you shouldn’t give it $100.
EDIT: Also, some people objected to not giving the $100, because they might be the emulation which Omega uses to predict whether you’d really give money. If you were an emulation, then you would remember precommitting in expectation to get $10,000 with a 50% chance. It makes no sense for Omega to emulate you in a scenario where you don’t get a chance to precommit.
That level of precomitting is only neccessary if you are unable to trust yourself to carry through with a self-imposed precommitment. If you are capable of this, you can decide now to act irrationally to certain future decisions in order to benefit to a greater degree than someone who can’t. If the temptation to go back on your self-promise is too great in the failure case, then you would have lost in the win case—you are simply a fortunate loser who found out the flaw in his promise in the case where being flawed was beneficial. It doesn’t change the fact that being capable of this decision would be a better strategy on average. Making yourself conditionally less rational can actually be a rational decision, and so the ability to do so can be a strength worth acquiring.
Ultimately the problem is the same as that of an ultimatum (eg. MAD). We want the other party to believe we will carry through even if it would be clearly irrational to do so at that point. As your opponent becomes better and better at predicting, you must become closer and closer to being someone who would make the irrational decision. When your opponent is sufficiently good (or you have insufficient knowledge as to how they are predicting), the only way to be sure is to be someone who would actually do it.
Okay, I agree that this level of precomitting is not necessary. But if the deal is really a one-time offer, then, when presented with the case of the coin already having come up tails, you can no longer ever benefit from being the sort of person who would precommit. Since you will never again be presented with a newcomb-like scenario, then you will have no benefit from being the precommiting type. Therefore you shouldn’t give the $100.
If, on the other hand, you still expect that you can encounter some other Omega-like thing which will present you with such a scenario, doesn’t this make the deal repeatable, which is not how the question was formulated?
In a repeatable deal your action influences the conditions in the next rounds. Even if you defect in this round, you may still cooperate in the next rounds, Omegas aren’t looking back at how you decided in the past, and don’t punish you by not offering the deals. Your success in the following rounds (from your current point of view) depends on whether you manage to precommit to the future encounters, not on what you do now.
In the repeatable scenario I believe, unlike Vladimir, that a real difference exists. Whatever decision process you use to decide not to pay $100 in one round, you can predict with high probability that that same process will operate in future rounds as well, leading to a total gain to you of about $0. On the other hand, you know that if your current decision process leads you to giving $100 in this case, then with high probability that same process will operate in future rounds, leading to a total gain to you of about $4950 x expected future rounds. Therefore, if you place a higher confidence in your ability to predict your future actions from your current ones than you do in your own reasoning process, you should give the $100 up. This makes the problem rather similar to the original Newcomb’s problem, in that you assign higher probability that your reasoning is wrong if it causes you to two-box than you do to any reasoning which leads you to two-box.
This is a self-deception technique. If you think it’s morally OK to self-deceive your future self for your current selfish ends, then by all means go ahead. Also, it looks like violent means of precommitment should actually be considered immoral, on par with forcing some other person to do your bidding by hiring a killer to kill them if they don’t comply.
In the Newcomb’s problem, it actually is in your self-interest to one-box. Not so in this problem.
I am fairly sure that it isn’t, but demonstrating so would require another maths-laden article, which I anticipate would be received similarly to my last. I will however email you my entire reasoning if you so wish (you will have to wait several days while I brush up on the logical concept of common knowledge). (I don’t know how to encode a ) in a link, so please add one to the end.)
Common knowledge (I used the %29 ASCII code for ”)”).
I’m going to write up my new position on this topic. Nonetheless I think it should be possible to discuss the question in a more concise form, since I think the problem is that of communication, not rigor. You deceive your future self, that’s the whole point of the comment above, make it believe that it wants to make an action that it actually doesn’t. The only disagreement position that I expect is saying that no, the future self actually wants to follow that action.
I think the problem with your article wasn’t that it was math-laden, but that you didn’t introduce things in sufficient detail to follow along, and to see the motivation behind the math.
To be perfectly honest, your last sentence is also my feeling. I should at the least have talked more about the key equation. But the article was already long, I was unsure as to how it would be received, and I spent too little time revising it (this is a persistent problem for me). If I were to write it again now, it would have been closer in style to the thread between you and me there.
If you intend to write another post, then I am happy to wait until then to introduce the ideas I have in mind, and I will try hard to do so in a manner that won’t alienate everyone.
If you think that through and decide that way, then your precommitting method didn’t work. The idea is that you must somehow now prevent your future self from behaving rationally in that situation—if they do, they will perform exactly the thought process you describe. The method of doing so, whether making a public promise (and valuing your spoken word more than $100), hiring a hitman to kill you if you renege or just having the capability of reliably convincing yourself to do so (effectively valuing keeping faith with your self-promise more than $100) doesn’t matter so long as it is effective. If merely deciding now is effective, then that is all that’s needed.
If you do then decide to take the rational course in the losing coinflip case, it just means you were wrong by definition about your commitment being effective. Luckily in this one case, you found it out in the loss case rather than the win case. Had you won the coin flip, you would have found yourself with nothing though.
How do you verify that “Omega” really is Omega and not a drunk in a bar? I can’t think of a way of doing it—so it sounds like a fraud to me.
Why is the Omega asking me, when it already knows my answer? So what happens to Omega/the universe when I say no?
If he asks me the question, I have already answered the question, so I don’t need to post this comment. I acted as I did. But i didn’t act as I did (Omega hasn’t shown up in my part of the universe), so we all know my answer.
If some guy walked up to you and gave you this spiel, you’d be fully justified in telling him to get lost, or even seeking mental help for him.
The problem assumes Omega to be genuine, and trustworthy.
Wow, this reddit softtware is pretty neat for a blog.
I’d love to see a post on the best introductory books to logic, and also epistemology. Epistemology, especially, seems to lack good introductory texts.
I know this is off-topic, but I feel duty-bound to respond (in the absence of profile pages or a really working direct message functionality).
“Epistemology: the big questions” by Blackwell publishing is awesome.
introductory logic texts are easy to find, but Hurley’s “A Concise Introduction to Logic” comes recommended, depending on what sort of intro you were looking for.
This doesn’t go here. I’m not sure where it goes—we don’t have open threads yet.
You might want to try Jaynes though.
If you want to respond to this, please make it a private message—this thread should be for discussing the post.
This is actually a parable on the boundaries of self (think a bit Buddhist here). Let me pose this another way: late last night in the pub, my past self committed to the drunken bet of $100 vs. $200 on the flip of a coin (the other guy was even more drunk than I was). My past self lost, but didn’t have the money. This morning, my present self gets a phone call from the person it lost to. Does it honor the bet? Assuming, as in typical in these hypothetical problems, that we can ignore the consequences (else we’d have to assign a cost to them that might well offset the gains, so we’ll just assign 0 and don’t consider them), a utilitarian approach is that I should default on the bet if I can get away with it. Why should I be responsible for what I said yesterday?
However, as usual in utilitarian dilemmas, the effect that we get in real-life is that we have a conscience—can I live with myself being the kind of person that doesn’t honor past commitments? So, most people will, out of one consideration or another, not think twice about paying up the $100.
Of Omega it is said that I can trust it more than I would myself. It knows more about me than I do myself. It would be part of myself if I didn’t consider it seperate from myself. If I consider my ego and Omega part of the same all-encompassing self, then honoring the commitment that Omega committed itself to on my behalf should draw the same response as if I had done it myself. Only if I perceive Omega as a separate entity to whom I am not morally obligated can I justify not paying the $100. Only with this individualist viewpoint will I see someone whom I am not obligated to in any way demanding $100 of me.
If you manage to instill your AI with a sense of the “common good”, a sense of brotherhood of all intelligent creatures, then it will, given the premises of trust etc., cooperate in this brotherhood—in fact, that is what I believe would be one of the meanings of “friendly”.
Your version of the story discards the most important ingredient: The fact that when you win the coin toss, you only receive money if you would have paid had you lost.
As for Omega, all we know about it is that somehow it can accurately predict your actions. For the purposes of Counterfactual Mugging we may as well regard Omega as a mindless robot which will burn the money you give to it and then self-destruct immediately after the game. (This makes it impossible to pay because you feel obligated to Omega. In fact, the idea is that you pay up because you feel obligated to your counterfactual self.)
I don’t see how your points apply: I would have paid had I lost. Except if my hypothetical self is so much in debt that it can’t reasonably spend $100 on an investment such as this—in which case Omega would have known in advance, and understands my nonpayment.
I do not consider the future existence of Omega as a factor at all, so it doesn’t matter whether it self-destructs or not. And it is also a given that Omega is absolutely trustworthy (more than I could say for myself).
My view is that this may well be one of the undecidable theorems that Goedel has shown must exist in any reasonably complex formal system. The only way to make it decidable is to think out of the box, and in this case it means that I consider that someone else is somehow still “me” (at least under ethical aspects) - there are other threads on here that involve splitting myself and still remaining the same person somehow, so it’s not intrinsically irrational or anything. My reference to Buddhism was merely meant to show that the concept is mainstream enough to be part of a major world religion, though most other religions and the UN charta of human rights have it as well, though not as pronounced, as “brotherhood”—not a factual, but an ethical identity.
After a good night’s sleep, here are some more thoughts:
To feel obligated to my counterfactual self, which exists only in the “mind” of Omega, and not feel obligated to Omega doesn’t make any sense to me.
Your additional assumptions about Omega destroy the utility that the $100 had—in the original version, $100 is $100 to both me and Omega, but in your version it is nothing to Omega. Your amended version of the problem amounts to “would I throw $100 into an incinerator on the basis of some thought experiment”, and that is clearly not even a zero-sum game if you consider the whole system—the original problem is zero-sum, and that gives me more freedom of choice.