This might be old news to everyone “in”, or just plain obvious, but a couple days ago I got Vladimir Nesov to admit he doesn’t actually know what he would do if faced with his Counterfactual Mugging scenario in real life. The reason: if today (before having seen any supernatural creatures) we intend to reward Omegas, we will lose for certain in the No-mega scenario, and vice versa. But we don’t know whether Omegas outnumber No-megas in our universe, so the question “do you intend to reward Omega if/when it appears” is a bead jar guess.
The caveat is of course that Counterfactual Mugging or Newcomb Problem are not to be analyzed as situations you encounter in real life: the artificial elements that get introduced are specified explicitly, not by an update from surprising observation. For example, the condition that Omega is trustworthy can’t be credibly expected to be observed.
The thought experiments explicitly describe the environment you play your part in, and your knowledge about it, the state of things that is much harder to achieve through a sequence of real-life observations, by updating your current knowledge.
I dunno, Newcomb’s Problem is often presented as a situation you’d encounter in real life. You’re supposed to believe Omega because it played the same game with many other people and didn’t make mistakes.
In any case I want a decision theory that works on real life scenarios. For example, CDT doesn’t get confused by such explosions of counterfactuals, it works perfectly fine “locally”.
ETA: My argument shows that modifying yourself to never “regret your rationality” (as Eliezer puts it) is impossible, and modifying yourself to “regret your rationality” less rather than more requires elicitation of your prior with humanly impossible accuracy (as you put it). I think this is a big deal, and now we need way more convincing problems that would motivate research into new decision theories.
If you do present observations that move the beliefs to represent the thought experiment, it’ll work just as well as the magically contrived thought experiment. But the absence of relevant No-megas is part of the setting, so it too should be a conclusion one draws from those observations.
Yes, but you must make the precommitment to love Omegas and hate No-megas (or vice versa) before you receive those observations, because that precommitment of yours is exactly what they’re judging. (I think you see that point already, and we’re probably arguing about some minor misunderstanding of mine.)
You never have to decide in advance, to precommit. Precommitment is useful as a signal to those that can’t follow your full thought process, and so you replace it with a simple rule from some point on (“you’ve already decided”). For Omegas and No-megas, you don’t have to precommit, because they can follow any thought process.
I thought about it some more and I think you’re either confused somewhere, or misrepresenting your own opinions. To clear things up let’s convert the whole problem statement into observational evidence.
Scenario 1: Omega appears and gives you convincing proof that Upsilon doesn’t exist (and that Omega is trustworthy, etc.), then presents you with CM.
Scenario 2: Upsilon appears and gives you convincing proof that Omega doesn’t exist, then presents you with anti-CM, taking into account your counterfactual action if you’d seen scenario 1.
You wrote: “If you do present observations that move the beliefs to represent the thought experiment, it’ll work just as well as the magically contrived thought experiment.” Now, I’m not sure what this sentence was supposed to mean, but it seems to imply that you would give up $100 in scenario 1 if faced with it in real life, because receiving the observations would make it “work just as well as the thought experiment”. This means you lose in scenario 2. No?
Omega would need to convince you that Upsilon not just doesn’t exist, but couldn’t exist, and that’s inconsistent with scenario 2. Otherwise, you haven’t moved your beliefs to represent the thought experiment. Upsilon must be actually impossible (less probable) in order for it to be possible for Omega to correctly convince you (without deception).
Being updateless, your decision algorithm is only interested in observations so far as they resolve logical uncertainty and say which situations you actually control (again, a sort of logical uncertainty), but observations can’t refute logically possible, so they can’t make Upsilon impossible if it wasn’t already impossible.
Omega would need to convince you that Upsilon not just doesn’t exist, but couldn’t exist, and that’s inconsistent with scenario 2.
No it’s not inconsistent. Counterfactual worlds don’t have to be identical to the real world. You might as well say that Omega couldn’t have simulated you in the counterfactual world where the coin came up heads, because that world is inconsistent with the real world. Do you believe that?
By “Upsilon couldn’t exist”, I mean that Upsilon doesn’t live in any of the possible worlds (or only in insignificantly few of them), not that it couldn’t appear in the possible world where you are speaking with Omega.
The convention is that the possible worlds don’t logically contradict each other, so two different outcomes of coin tosses exist in two slightly different worlds, both of which you care about (this situation is not logically inconsistent). If Upsilon lives on such a different possible world, and not on the world with Omega, it doesn’t make Upsilon impossible, and so you care what it does. In order to replicate Counterfactual Mugging, you need the possible worlds with Upsilons to be irrelevant, and it doesn’t matter that Upsilons are not in the same world as the Omega you are talking to.
(How to correctly perform counterfactual reasoning on conditions that are logically inconsistent (such as the possible actions you could make that are not your actual action), or rather how to mathematically understand that reasoning is the septillion dollar question.)
Ah, I see. You’re saying Omega must prove to you that your prior made Upsilon less likely than Omega all along. (By the way, this is an interesting way to look at modal logic, I wonder if it’s published anywhere.) This is a very tall order for Omega, but it does make the two scenarios logically inconsistent. Unless they involve “deception”—e.g. Omega tweaking the mind of counterfactual-you to believe a false proof. I wonder if the problem still makes sense if this is allowed.
Whatever our prior for encountering No-mega, it should be counterbalanced by our prior for encountering Yes-mega (who rewards you if you are counterfactually-muggable).
You haven’t considered the full extent of the damage. What is your prior over all crazy mind-reading agents that can reward or punish you for arbitrary counterfactual scenarios? How can you be so sure that it will balance in favor of Omega in the end?
In fact, I can consider all crazy mind-reading reward/punishment agents at once: For every such hypothetical agent, there is its hypothetical dual, with the opposite behavior with respect to my status as being counterfactually-muggable (the one rewarding what the other punishes, and vice versa). Every such agent is the dual of its own dual; in the universal prior, being approached by an agent is about as likely as being approached by its dual; and I don’t think I have any evidence that one agent will be more likely to appear than its dual. Thus, my total expected payoff from these agents is 0.
Omega itself does not belong to this class of agent; it has no dual. (ETA: It has a dual, but the dual is a deceptive Omega, which is much less probable than Omega. See below.) So Omega is the only one I should worry about.
I should add that I feel a little uneasy because I can’t prove that these infinitesimal priors don’t dominate everything when the symmetry is broken, especially when the stakes are high.
Okay, I’ll be more explicit: I am considering the class of agents who behave one way if they predict you’re muggable and behave another way if they predict you’re unmuggable. The dual of an agent behaves exactly the same as the original agent, except the behaviors are reversed. In symbols:
An agent A has two behaviors.
It it predicts you’d give Omega $5, it will exhibit behavior X; otherwise, it will exhibit behavior Y.
The dual agent A* exhibits behavior Y if it predicts you’d give Omega $5, and X otherwise.
A and A* are equally likely in my prior.
What about Omega?
Omega has two behaviors.
If it predicts you’d give Omega $5, it will flip a coin and give you $100 on heads; otherwise, nothing. In either case, it will tell you the rules of the game.
What would Omega* be?
If Omega predicts you’d give Omega $5, it will do nothing. Otherwise, it will flip a coin and give you $100 on heads. In either case, it will assure you that it is Omega, not Omega.
So the dual of Omega is something that looks like Omega but is in fact deceptive. By hypothesis, Omega is trustworthy, so my prior probability of encountering Omega* is negligible compared to meeting Omega.
(So yeah, there is a dual of Omega, but it’s much less probable than Omega.)
Then, when I calculate expected utility, each agent A is balanced by its dual A , but Omega is not balanced by Omega.
If we assume you can tell “deceptive” agents from “non-deceptive” ones and shift probability weight accordingly, then not every agent is balanced by its dual, because some “deceptive” agents probably have “non-deceptive” duals and vice versa. No?
(Apologies if I’m misunderstanding—this stuff is slowly getting too complex for me to grasp.)
The reason we shift probability weight away from the deceptive Omega is that, in the original problem, we are told that we believe Omega to be non-deceptive. The reasoning goes like this: If it looks like Omega and talks like Omega, then it might be Omega or Omega . But if it were Omega* , then it would be deceiving us, so it’s most probably Omega.
In the original problem, we have no reason to believe that No-mega and friends are non-deceptive.
(But if we did, then yes, the dual of a non-deceptive agent would be deceptive, and so have lower prior probability. This would be a different problem, but it would still have a symmetry: We would have to define a different notion of dual, where the dual of an agent has the reversed behavior and also reverses its claims about its own behavior.
What would Omega* be in that case? It would not claim to be Omega. It would truthfully tell you that if it predicted you would not give it $5 on tails, then it would flip a coin and give you $100 on heads; and otherwise it would not give you anything. This has no bearing on your decision in the Omega problem.)
By your definitions, Omega would condition its decision on you being counterfactually muggable by the original Omega, not on you giving money to Omega itself. Or am I losing the plot again? This notion of “duality” seems to be getting more and more complex.
“Duality” has become more complex because we’re now talking about a more complex problem — a version of Counterfactual Mugging where you believe that all superintelligent agents are trustworthy. The old version of duality suffices for the ordinary Counterfactual Mugging problem.
My thesis is that there’s always a symmetry in the space of black swans like No-mega.
In the case currently under consideration, I’m assuming Omega’s spiel goes something like “I just flipped a coin. If it had been heads, I would have predicted what you would do if I had approached you and given my spiel....” Notice the use of first-person pronouns. Omega* would have almost the same spiel verbatim, also using first-person pronouns, and make no reference to Omega. And, being non-deceptive, it would behave the way it says it does. So it wouldn’t condition on your being muggable by Omega.
You could object to this by claiming that Omega actually says “I am Omega. If Omega had come up to you and said....”, in which case I can come up with a third notion of duality.
If Omega* makes no reference to the original Omega, I don’t understand why they have “opposite behavior with respect to my status as being counterfactually-muggable” (by the original Omega), which was your reason for inventing “duality” in the first place. I apologize, but at this point it’s unclear to me that you actually have a proof of anything. Maybe we can take this discussion to email?
Surely the last thing on anyone’s mind, having been persuaded they’re in the presence of Omega in real life, is whether or not to give $100 :)
I like the No-mega idea (it’s similar to a refutation of Pascal’s wager by invoking contrary gods), but I wouldn’t raise my expectation for the number of No-mega encounters I’ll have by very much upon encountering a solitary Omega.
Generalizing No-mega to include all sorts of variants that reward stupid or perverse behavior (are there more possible God-likes that reward things strange and alien to us?), I’m not in the least bit concerned.
I suppose it’s just a good argument not to make plans for your life on the basis of imagined God-like beings. There should be as many gods who, when pleased with your action, intervene in your life in a way you would not consider pleasant, and are pleased at things you’d consider arbitrary, as those who have similar values they’d like us to express, and/or actually reward us copacetically.
I wouldn’t raise my expectation for the number of No-mega encounters I’ll have by very much upon encountering a solitary Omega.
You don’t have to. Both Omega and No-mega decide based on what your intentions were before seeing any supernatural creatures. If right now you say “I would give money to Omega if I met one”—factoring in all belief adjustments you would make upon seeing it—then you should say the reverse about No-mega, and vice versa.
ETA: Listen, I just had a funny idea. Now that we have this nifty weapon of “exploding counterfactuals”, why not apply it to Newcomb’s Problem too? It’s an improbable enough scenario that we can make up a similarly improbable No-mega that would reward you for counterfactual two-boxing. Damn, this technique is too powerful!
By not believing No-mega is probable just because I saw an Omega, I mean that I plan on considering such situations as they arise on the basis that only the types of godlike beings I’ve seen to date (so far, none) exist. I’m inclined to say that I’ll decide in the way that makes me happiest, provided I believe that the godlike being is honest and really can know my precommitment.
I realize this leaves me vulnerable to the first godlike huckster offering me a decent exclusive deal; I guess this implies that I think I’m much more likely to encounter 1 godlike being than many.
This might be old news to everyone “in”, or just plain obvious, but a couple days ago I got Vladimir Nesov to admit he doesn’t actually know what he would do if faced with his Counterfactual Mugging scenario in real life. The reason: if today (before having seen any supernatural creatures) we intend to reward Omegas, we will lose for certain in the No-mega scenario, and vice versa. But we don’t know whether Omegas outnumber No-megas in our universe, so the question “do you intend to reward Omega if/when it appears” is a bead jar guess.
The caveat is of course that Counterfactual Mugging or Newcomb Problem are not to be analyzed as situations you encounter in real life: the artificial elements that get introduced are specified explicitly, not by an update from surprising observation. For example, the condition that Omega is trustworthy can’t be credibly expected to be observed.
The thought experiments explicitly describe the environment you play your part in, and your knowledge about it, the state of things that is much harder to achieve through a sequence of real-life observations, by updating your current knowledge.
I dunno, Newcomb’s Problem is often presented as a situation you’d encounter in real life. You’re supposed to believe Omega because it played the same game with many other people and didn’t make mistakes.
In any case I want a decision theory that works on real life scenarios. For example, CDT doesn’t get confused by such explosions of counterfactuals, it works perfectly fine “locally”.
ETA: My argument shows that modifying yourself to never “regret your rationality” (as Eliezer puts it) is impossible, and modifying yourself to “regret your rationality” less rather than more requires elicitation of your prior with humanly impossible accuracy (as you put it). I think this is a big deal, and now we need way more convincing problems that would motivate research into new decision theories.
If you do present observations that move the beliefs to represent the thought experiment, it’ll work just as well as the magically contrived thought experiment. But the absence of relevant No-megas is part of the setting, so it too should be a conclusion one draws from those observations.
Yes, but you must make the precommitment to love Omegas and hate No-megas (or vice versa) before you receive those observations, because that precommitment of yours is exactly what they’re judging. (I think you see that point already, and we’re probably arguing about some minor misunderstanding of mine.)
You never have to decide in advance, to precommit. Precommitment is useful as a signal to those that can’t follow your full thought process, and so you replace it with a simple rule from some point on (“you’ve already decided”). For Omegas and No-megas, you don’t have to precommit, because they can follow any thought process.
I thought about it some more and I think you’re either confused somewhere, or misrepresenting your own opinions. To clear things up let’s convert the whole problem statement into observational evidence.
Scenario 1: Omega appears and gives you convincing proof that Upsilon doesn’t exist (and that Omega is trustworthy, etc.), then presents you with CM.
Scenario 2: Upsilon appears and gives you convincing proof that Omega doesn’t exist, then presents you with anti-CM, taking into account your counterfactual action if you’d seen scenario 1.
You wrote: “If you do present observations that move the beliefs to represent the thought experiment, it’ll work just as well as the magically contrived thought experiment.” Now, I’m not sure what this sentence was supposed to mean, but it seems to imply that you would give up $100 in scenario 1 if faced with it in real life, because receiving the observations would make it “work just as well as the thought experiment”. This means you lose in scenario 2. No?
Omega would need to convince you that Upsilon not just doesn’t exist, but couldn’t exist, and that’s inconsistent with scenario 2. Otherwise, you haven’t moved your beliefs to represent the thought experiment. Upsilon must be actually impossible (less probable) in order for it to be possible for Omega to correctly convince you (without deception).
Being updateless, your decision algorithm is only interested in observations so far as they resolve logical uncertainty and say which situations you actually control (again, a sort of logical uncertainty), but observations can’t refute logically possible, so they can’t make Upsilon impossible if it wasn’t already impossible.
No it’s not inconsistent. Counterfactual worlds don’t have to be identical to the real world. You might as well say that Omega couldn’t have simulated you in the counterfactual world where the coin came up heads, because that world is inconsistent with the real world. Do you believe that?
By “Upsilon couldn’t exist”, I mean that Upsilon doesn’t live in any of the possible worlds (or only in insignificantly few of them), not that it couldn’t appear in the possible world where you are speaking with Omega.
The convention is that the possible worlds don’t logically contradict each other, so two different outcomes of coin tosses exist in two slightly different worlds, both of which you care about (this situation is not logically inconsistent). If Upsilon lives on such a different possible world, and not on the world with Omega, it doesn’t make Upsilon impossible, and so you care what it does. In order to replicate Counterfactual Mugging, you need the possible worlds with Upsilons to be irrelevant, and it doesn’t matter that Upsilons are not in the same world as the Omega you are talking to.
(How to correctly perform counterfactual reasoning on conditions that are logically inconsistent (such as the possible actions you could make that are not your actual action), or rather how to mathematically understand that reasoning is the septillion dollar question.)
Ah, I see. You’re saying Omega must prove to you that your prior made Upsilon less likely than Omega all along. (By the way, this is an interesting way to look at modal logic, I wonder if it’s published anywhere.) This is a very tall order for Omega, but it does make the two scenarios logically inconsistent. Unless they involve “deception”—e.g. Omega tweaking the mind of counterfactual-you to believe a false proof. I wonder if the problem still makes sense if this is allowed.
Sorry, can’t parse that, you’d need to unpack more.
Whatever our prior for encountering No-mega, it should be counterbalanced by our prior for encountering Yes-mega (who rewards you if you are counterfactually-muggable).
You haven’t considered the full extent of the damage. What is your prior over all crazy mind-reading agents that can reward or punish you for arbitrary counterfactual scenarios? How can you be so sure that it will balance in favor of Omega in the end?
In fact, I can consider all crazy mind-reading reward/punishment agents at once: For every such hypothetical agent, there is its hypothetical dual, with the opposite behavior with respect to my status as being counterfactually-muggable (the one rewarding what the other punishes, and vice versa). Every such agent is the dual of its own dual; in the universal prior, being approached by an agent is about as likely as being approached by its dual; and I don’t think I have any evidence that one agent will be more likely to appear than its dual. Thus, my total expected payoff from these agents is 0.
Omega itself does not belong to this class of agent; it has no dual. (ETA: It has a dual, but the dual is a deceptive Omega, which is much less probable than Omega. See below.) So Omega is the only one I should worry about.
I should add that I feel a little uneasy because I can’t prove that these infinitesimal priors don’t dominate everything when the symmetry is broken, especially when the stakes are high.
Why? Can’t your definition of dual be applied to Omega? I admit I don’t completely understand the argument.
Okay, I’ll be more explicit: I am considering the class of agents who behave one way if they predict you’re muggable and behave another way if they predict you’re unmuggable. The dual of an agent behaves exactly the same as the original agent, except the behaviors are reversed. In symbols:
An agent A has two behaviors.
It it predicts you’d give Omega $5, it will exhibit behavior X; otherwise, it will exhibit behavior Y.
The dual agent A* exhibits behavior Y if it predicts you’d give Omega $5, and X otherwise.
A and A* are equally likely in my prior.
What about Omega?
Omega has two behaviors.
If it predicts you’d give Omega $5, it will flip a coin and give you $100 on heads; otherwise, nothing. In either case, it will tell you the rules of the game.
What would Omega* be?
If Omega predicts you’d give Omega $5, it will do nothing. Otherwise, it will flip a coin and give you $100 on heads. In either case, it will assure you that it is Omega, not Omega.
So the dual of Omega is something that looks like Omega but is in fact deceptive. By hypothesis, Omega is trustworthy, so my prior probability of encountering Omega* is negligible compared to meeting Omega.
(So yeah, there is a dual of Omega, but it’s much less probable than Omega.)
Then, when I calculate expected utility, each agent A is balanced by its dual A , but Omega is not balanced by Omega.
If we assume you can tell “deceptive” agents from “non-deceptive” ones and shift probability weight accordingly, then not every agent is balanced by its dual, because some “deceptive” agents probably have “non-deceptive” duals and vice versa. No?
(Apologies if I’m misunderstanding—this stuff is slowly getting too complex for me to grasp.)
The reason we shift probability weight away from the deceptive Omega is that, in the original problem, we are told that we believe Omega to be non-deceptive. The reasoning goes like this: If it looks like Omega and talks like Omega, then it might be Omega or Omega . But if it were Omega* , then it would be deceiving us, so it’s most probably Omega.
In the original problem, we have no reason to believe that No-mega and friends are non-deceptive.
(But if we did, then yes, the dual of a non-deceptive agent would be deceptive, and so have lower prior probability. This would be a different problem, but it would still have a symmetry: We would have to define a different notion of dual, where the dual of an agent has the reversed behavior and also reverses its claims about its own behavior.
What would Omega* be in that case? It would not claim to be Omega. It would truthfully tell you that if it predicted you would not give it $5 on tails, then it would flip a coin and give you $100 on heads; and otherwise it would not give you anything. This has no bearing on your decision in the Omega problem.)
Edit: Formatting.
By your definitions, Omega would condition its decision on you being counterfactually muggable by the original Omega, not on you giving money to Omega itself. Or am I losing the plot again? This notion of “duality” seems to be getting more and more complex.
“Duality” has become more complex because we’re now talking about a more complex problem — a version of Counterfactual Mugging where you believe that all superintelligent agents are trustworthy. The old version of duality suffices for the ordinary Counterfactual Mugging problem.
My thesis is that there’s always a symmetry in the space of black swans like No-mega.
In the case currently under consideration, I’m assuming Omega’s spiel goes something like “I just flipped a coin. If it had been heads, I would have predicted what you would do if I had approached you and given my spiel....” Notice the use of first-person pronouns. Omega* would have almost the same spiel verbatim, also using first-person pronouns, and make no reference to Omega. And, being non-deceptive, it would behave the way it says it does. So it wouldn’t condition on your being muggable by Omega.
You could object to this by claiming that Omega actually says “I am Omega. If Omega had come up to you and said....”, in which case I can come up with a third notion of duality.
If Omega* makes no reference to the original Omega, I don’t understand why they have “opposite behavior with respect to my status as being counterfactually-muggable” (by the original Omega), which was your reason for inventing “duality” in the first place. I apologize, but at this point it’s unclear to me that you actually have a proof of anything. Maybe we can take this discussion to email?
Surely the last thing on anyone’s mind, having been persuaded they’re in the presence of Omega in real life, is whether or not to give $100 :)
I like the No-mega idea (it’s similar to a refutation of Pascal’s wager by invoking contrary gods), but I wouldn’t raise my expectation for the number of No-mega encounters I’ll have by very much upon encountering a solitary Omega.
Generalizing No-mega to include all sorts of variants that reward stupid or perverse behavior (are there more possible God-likes that reward things strange and alien to us?), I’m not in the least bit concerned.
I suppose it’s just a good argument not to make plans for your life on the basis of imagined God-like beings. There should be as many gods who, when pleased with your action, intervene in your life in a way you would not consider pleasant, and are pleased at things you’d consider arbitrary, as those who have similar values they’d like us to express, and/or actually reward us copacetically.
You don’t have to. Both Omega and No-mega decide based on what your intentions were before seeing any supernatural creatures. If right now you say “I would give money to Omega if I met one”—factoring in all belief adjustments you would make upon seeing it—then you should say the reverse about No-mega, and vice versa.
ETA: Listen, I just had a funny idea. Now that we have this nifty weapon of “exploding counterfactuals”, why not apply it to Newcomb’s Problem too? It’s an improbable enough scenario that we can make up a similarly improbable No-mega that would reward you for counterfactual two-boxing. Damn, this technique is too powerful!
By not believing No-mega is probable just because I saw an Omega, I mean that I plan on considering such situations as they arise on the basis that only the types of godlike beings I’ve seen to date (so far, none) exist. I’m inclined to say that I’ll decide in the way that makes me happiest, provided I believe that the godlike being is honest and really can know my precommitment.
I realize this leaves me vulnerable to the first godlike huckster offering me a decent exclusive deal; I guess this implies that I think I’m much more likely to encounter 1 godlike being than many.