Omega comes along and says “I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day.”
That’s more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory—you’re being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not “you” as I’m defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)
What you need here is for the version of you that Omega simulates facing Newcomb’s Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn’t know/believe this, then you really are screwed, but it’s more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you’re being punished based on the output of a decision system that didn’t know about the dependency of that event on its output—sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.
If the version of you facing Newcomb’s Problem has a prior over Omega doing things like this, even if the other you’s observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega’s after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb’s problem. How do you figure that it doesn’t?
Because the decision diagonal I wrote out, handles the probable consequences of “this computation” doing something, given its current state of knowledge—its current, updated P—so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won’t care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says “don’t pay”.
Ah! so you’re defining “this” as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
Note that even selfish agents must do this in order to care about themselves five minutes in the future.
To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
Ah! so you’re defining “this” as exact bitwise match
That’s not the problem. The problem is that you’ve already updated your probability distribution, so you just don’t care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn’t over them, but because they have negligible probability.
the number has been chosen to be prime or composite depending on whether the money is in the opaque box
(First read that variant in Martin Gardner.) The epistemically intuitive answer is “Once I choose to take one box, I will be able to infer that this number has always been prime”. If I wanted to walk through TDT doing this, I’d draw a causal graph with Omega’s choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that—knowing / having decided your logical choice—seeing this number becomes evidence that its primality test came up positive.
In terms of logical control, you don’t control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don’t remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it—but that doesn’t contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
Sorry, that doesn’t make any sense. It’s a probability distribution that’s the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It’s precisely the assertion that all such problems have to be solved at the probability distribution level that I’m disputing. I’ll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
It might be nontrivial to do this in a way that doesn’t automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?
That’s more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory—you’re being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not “you” as I’m defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)
What you need here is for the version of you that Omega simulates facing Newcomb’s Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn’t know/believe this, then you really are screwed, but it’s more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you’re being punished based on the output of a decision system that didn’t know about the dependency of that event on its output—sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.
If the version of you facing Newcomb’s Problem has a prior over Omega doing things like this, even if the other you’s observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega’s after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb’s problem. How do you figure that it doesn’t?
Because the decision diagonal I wrote out, handles the probable consequences of “this computation” doing something, given its current state of knowledge—its current, updated P—so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won’t care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says “don’t pay”.
Ah! so you’re defining “this” as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
Note that even selfish agents must do this in order to care about themselves five minutes in the future.
To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
That’s not the problem. The problem is that you’ve already updated your probability distribution, so you just don’t care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn’t over them, but because they have negligible probability.
(First read that variant in Martin Gardner.) The epistemically intuitive answer is “Once I choose to take one box, I will be able to infer that this number has always been prime”. If I wanted to walk through TDT doing this, I’d draw a causal graph with Omega’s choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that—knowing / having decided your logical choice—seeing this number becomes evidence that its primality test came up positive.
In terms of logical control, you don’t control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don’t remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it—but that doesn’t contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
Sorry, that doesn’t make any sense. It’s a probability distribution that’s the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It’s precisely the assertion that all such problems have to be solved at the probability distribution level that I’m disputing. I’ll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
None of the decision theories in question assume a purely selfish agent.
No, but most of the example problems do.
It might be nontrivial to do this in a way that doesn’t automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?