Counterfactual Mugging and Logical Uncertainty
Followup to: Counterfactual Mugging.
Let’s see what happens with Counterfactual Mugging, if we replace the uncertainty about an external fact of how a coin lands, with logical uncertainty, for example about what is the n-th place in the decimal expansion of pi.
The original thought experiment is as follows:
Omega appears and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But Omega also tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it $100 if the coin came up tails.
Let’s change “coin came up tails” to “10000-th digit of pi is even”, and correspondingly for heads. This gives Logical Counterfactual Mugging:
Omega appears and says that it has just found out what that 10000th decimal digit of pi is 8, and given that it is even, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your $100. But Omega also tells you that if the 10000th digit of pi turned out to be odd instead, it’d give you $10000, but only if you’d agree to give it $100 given that the 10000th digit is even.
This form of Counterfactual Mugging may be instructive, as it slaughters the following false intuition, or equivalently conceptualization of “could”: “the coin could land either way, but a logical truth couldn’t be either way”.
For the following, let’s shift the perspective to Omega, and consider the problem about 10001th digit, which is 5 (odd). It’s easy to imagine that given that the 10001th digit of pi is in fact 5, and you decided to only give away the $100 if the digit is odd, then Omega’s prediction of your actions will still be that you’d give away $100 (because the digit is in fact odd). Direct prediction of your actions can’t include the part where you observe that the digit is even, because the digit is odd.
But Omega doesn’t compute what you’ll do in reality, it computes what you would do if the 10001th digit of pi was even (which it isn’t). If you decline to give away the $100 if the digit is even, Omega’s simulation of counterfactual where the digit is even will say that you wouldn’t oblige, and so you won’t get the $10000 in reality, where the digit is odd.
Imagine it constructively this way: you have the code of a procedure, Pi(n), that computes the n-th digit of pi once it’s run. If your strategy is
if(Is_Odd(Pi(n))) then Give(“$100”);
then, given that n==10001, Pi(10001)==5, and Is_Odd(5)==true, the program outputs “$100”. But Omega tests what’s the output of the code on which it performed a surgery, replacing Is_Odd(Pi(n)) by false instead of true to which it normally evaluates. Thus it’ll be testing the code
if(false) then Give(“$100”);
This counterfactual case doesn’t give away $100, and so Omega decides that you won’t get the $10000.
For the original problem, when you consider what would happen if the coin fell differently, you are basically performing the same surgery, replacing the knowledge about the state of the coin in the state of mind. If you use the (wrong) strategy
if(Coin==”heads”) then Give(“$100”)
and the coin comes up “heads”, so that Omega is deciding whether to give you $10000, then Coin==”heads”, but Omega is evaluating the modified algorithm where Coin is replaced by “tails”:
if(“tails”==”heads”) then Give(“$100”)
Another way of intuitively thinking about Logical CM is to consider the index of the digit (here, 10000 or 10001) to be a random variable. Then, the choice of number n (value of the random variable) in Omega’s question is a perfect analogy with the outcome of a coin toss.
With a random index instead of “direct” mathematical uncertainty, the above evaluation of counterfactual uses (say) 10000 to replace n (so that Is_Odd(Pi(10000))==false), instead of directly using false to replace Is_Odd(P(n)) with false:
if(Is_Odd(Pi(10000))) then Give(“$100”);
The difference is that with the coin or random digit number, the parameter is explicit and atomic (Coin and n, respectively), while with the oddness of n-th digit, the parameter Is_Odd(P(n)) isn’t atomic. How can it be detected in the code (in the mind) — it could be written in obfuscated assembly, not even an explicit subexpression of the program? By the connection to the sense of the problem statement itself: when you are talking about what you’ll do if the n-th digit of pi is even or odd, or what Omega will do if you give or not give away $100 in each case, you are talking about exactly your Is_Odd(Pi(n)), or something from which this code will be constructed. The meaning of procedure Pi(n) is dependent on the meaning of the problem, and through this dependency counterfactual surgery can reach down and change the details of the algorithm to answer the counterfactual query posed by the problem.
- In Defense of Open-Minded UDT by 12 Aug 2024 18:27 UTC; 77 points) (
- Contra Common Knowledge by 4 Jan 2023 22:50 UTC; 52 points) (
- Policy Alignment by 30 Jun 2018 0:24 UTC; 51 points) (
- A Problem About Bargaining and Logical Uncertainty by 21 Mar 2012 21:03 UTC; 47 points) (
- Policy Selection Solves Most Problems by 1 Dec 2017 0:35 UTC; 21 points) (
- 4 Jun 2024 10:03 UTC; 13 points) 's comment on mesaoptimizer’s Shortform by (
- Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty by 16 Sep 2020 10:34 UTC; 9 points) (
- An Approach to Logically Updateless Decisions by 21 May 2017 23:02 UTC; 3 points) (
- 5 Sep 2009 19:52 UTC; 2 points) 's comment on Torture vs. Dust vs. the Presumptuous Philosopher: Anthropic Reasoning in UDT by (
- 17 Sep 2020 16:22 UTC; 2 points) 's comment on Applying the Counterfactual Prisoner’s Dilemma to Logical Uncertainty by (
- 30 Aug 2012 20:59 UTC; 1 point) 's comment on A model of UDT with a concrete prior over logical statements by (
- 14 Sep 2009 7:01 UTC; 0 points) 's comment on Outlawing Anthropics: An Updateless Dilemma by (
- 16 Sep 2013 20:41 UTC; 0 points) 's comment on Notes on logical priors from the MIRI workshop by (
Would you (or your ideal of rationality) still give $100 if I replace “10000th decimal digit of pi” with “the 10000th positive integer”, or with “the smallest non-negative integer”, or with just “0″?
If not, what’s special about “10000th decimal digit of pi”? (Apparently you’re assuming that you can compute it in your head, so that’s not the difference.)
If yes, how do you (or Omega) compute a counterfactual where 0 is odd, or 1 is even?
Upon further thought: there is no objective answer to “what you would do if 1 was even” or “what you would do if the 10001th digit of pi was even” (given your source code). The answer that Omega computes has to be more or less arbitrary, and depends on details of Omega’s source code. If you knew that Omega was going to logical-counterfactually mug you, and you knew Omega’s source code, and the reward is high enough, then you’d do whatever modifications are necessary on your own source code so that Omega would compute the “right” answer and reward you.
Therefore, if we include such problems in the problem class for which a decision algorithm should be reflectively consistent, then no decision algorithm is reflectively consistent.
ETA: Notice that in the version of CM with a physical coin, or with the n-th digit of pi where Omega is not computing what you would do if it was even or odd, but what you would do if you were told that it is even or odd, there is an objective answer to “what you would do if you were to receive the input ‘coin landed tails’” and “what you would do if you were to receive the input ’10000-th digit of pi is odd’”, which simply involves running your source code on the given input.
My understanding of the point of the post was that while a coin may physically land differently and thus instantiate the counterfactual, it is merely my current lack of knowledge (the “logical uncertainty” in the post title) that allows me to simulate a kind of pseudo-counterfactual in this case.
Since I do not know the millionth digit of pi, I can still speak meaningfully of the cases where it is and isn’t odd.
The 10001th digit of pi is 5.
The simplest case is when a fact that is being considered counterfactually is received from a given observation, so that you can explicitly say where the parameter is in the system, and use the dynamic specification of the system to see what happens to it depending on the parameter. That’s the case with the coin and random digit index.
10000th digit of pi is one step more complicated, but it’s still independent on most of your knowledge, so it’s conceptually easier to localize knowledge about it in your mind. Once you start considering the question, knowledge about its answer starts affecting your dynamic, and this influence can likewise be tracked to the source. That’s why I introduced Pi(n) as a local expression: all the knowledge in the algorithm about the answer to this question comes from this single procedure, so by varying its contents you can examine the impact of its different values of the future behavior.
Whether or not 1 is even is much more pervasive, so the surgery that changes it will be hard and not at all intuitively obvious. So, the disagreement seems to be that you trust your intuition about whether it’s possible to make 1 an even number in your mind, while I trust the generalization of idea that you can change whether the coin lands on one side or another, whether Pi(10000) is even or odd, and arbitrarily more pervasive questions as well.
This does depend a lot on what Omega understands by the question (how Omega’s algorithm logically depends on the question, and on your algorithm), which is related by my unwillingness to conclude that mutual cooperation is the clear-cut outcome of PD. In this thought experiment, this understanding is mostly specified, in other cases intuitive grasp of the problem won’t be enough.
If a theory of logical counterfactuals is to apply to statements of the form “If X was true, then Y would be true”, do we need to restrict the forms of X and Y, or can they be arbitrary mathematical propositions?
For example, does it make sense to ask something like, “What is 13*3, if 3*3 was 8?” An obvious answer is “38″, but what if you’re doing multiplication in binary?
I don’t see why a theory of counterfactuals couldn’t apply to mathematical propositions. After all, our cognitive architectures use causality at a primitive level, and the same architecture is taught math.
And certainly, while learning math, you were taught results that didn’t “seem” right at the time, so you worked backwards until you could understand why that result (like 2+6 = 8) makes sense.
So you just have to imagine yourself in such a similar situation about math, learning it for the first time. If everyone in class seemed to understand multiplication but you, and it were also a fact that 3*3 = 8, what process would you figure was actually going on when you multiply? Then, apply that to 13*3.
To this I ask: “Which 3*3?”. The whole procedure is something that is done with a description of program (system), and any facts of which we can speak as holding for the system are properties of the system’s “mind”. Thus, the fact of what 3*3 is must be located somewhere specifically (more generally, as a property), for it to be meaningful to talk about this fact in relation to the system. You are considering interaction between this fact, as parameter, and the rest of the system, and this activity requires seeing both on equal rights.
When you, as a human, reading the question, you may try to interpret it as pointing to a specific subsystem, as I did in the post. More generally, the question is only meaningful in this way if it admits such interpretation.
I think I sort of see what you mean. Perhaps this is an avenue worth exploring, given that we don’t seem to have many other suggestions on how to solve logical uncertainty. I’ll have to think on this more.
The 10000th decimal digit of pi is 8, by the way (not counting the leading 3).
What does Omega do if your algorithm contains “if ⊥ is provable, then give $100”?
Could you be more explicit about the intention of the question?
Omega’s surgery doesn’t introduce contradictions in your code: it doesn’t (say) make Pi(10000) evaluate to 7, it just replaces Pi(10000) in the code with 7, which gives perfectly good code, just different from the original.
Perhaps this version of Counterfactual Mugging is not really about logical uncertainty, but rather uncertainty about one’s source code. In UDT1, I assumed that an agent would know its own source code with certainty, but here, if we suppose that Omega does its counterfactual prediction using source-code surgery plus simulation, then our agent can’t distinguish whether it’s the agent with the original source code, or the one in the simulation with the modified source code.
Although I haven’t worked out the details, it seems possible to modify UDT1 to handle uncertainty about one’s source code, with the result that such an agent would give Omega $100 in this situation. Basically, when Is_Odd(Pi(n)) returns “true”, you would think:
Did it return “true” because Pi(n) is odd, or because Pi(n) is even and my source code has been modified for it to return true anyway? I don’t know, so I don’t know whether Pi(n) is even or odd, and I better act as if I don’t know.
This doesn’t seem to require slaughtering the intuition that “a logical truth couldn’t be either way” because I can think that a logical truth couldn’t be either way but I just don’t know which way it is, and that still allows me to make the right decision. Do you agree, or do you still think that intuition needs to go?
I’d say things differently now. I’d drop the distinction between “logical uncertainty” and uncertainty about the output of one’s source code, as knowledge about a formal system basically is a program that you can run, which basically is part of your source code (maybe with observed data, but then data became part of you—what distinguishes you observing an event from the event observing you? -- it’s more like merging). The important intuition in this case is that there is no transparency, that having a source code of a program is not at all the same thing as knowing how it behaves (it’s not even about the halting problem, as simple calculations are still some computational steps away—although static analysis (abstraction) may allow to run infinitely faster). You are not uncertain about your source code, you are uncertain about what it’ll do. Logical hypotheticals can be seen as playing the central role in decision-making, as the steps in proof search that suggests the steps that one’s own (known) algorithms could do, and seeing whether they should be made real (“winning”, in games semantics terminology, which is highly misleading from goal-directed strategy point of view, as they only won your choice, not the “game”). While you can’t reach some logical truths in a limited time, you can consider their hypothetical states, thus the program isn’t so much being modified, as it is being refined where its consequences can’t be directly observed (with naive formalism the difference between the program and its effect blurs). I still have serious gaps in my understanding of this stuff, so am not ready to describe it yet.
If things that “could” be done or “could” happen are ones considered in hypotheticals during decision-making, then logical truths (possible behaviors of a program) should be comfortable as things that could be either way.
Omega could have known the 10000th digit of pi beforehand, so now it has a strategy for reliably extracting money from you. Or do you place some other restriction on Omega’s behavior that I’m not seeing?
Like in the original problem, Omega’s decision to run the game is independent on the game’s parameters. Omega is not trying to win, it just implements the thought experiment.
I can’t tell what your recommended action is—do you give the $100 or not?
I think this case is essentially the same as the original one, and this similarity is the topic of the post.
It looks like in the original case (and so this one) you should give the $100 if you are an AI running human preference, and most likely if you are a human too, unless human preference gets “updated” (currupted) by the reflectively inconsistent human brain, so that once you learn about the new fact, the new preference says that you shouldn’t give the $100, because the probability of the alternative dropped through the floor (in your representation).
Where is the best place to read an explanation of why giving the $100 is what you “should” do? (Or could someone please summarize the rationale?)
You can read the first thread, the post for a short description of the theoretical reasons for giving up $100 (expected utility, reflective consistency), and more in the comments.
As I noted, I’m not sure it’s what you really should do, as a human, but it looks like it. I changed my mind about this conclusion a couple of times since the problem statement, first believing that you should give up $100, because it was what the UDT suggested, then that you shouldn’t, remembering that human brain probably does erase the counterfactual preference; now I’m back to being unsure about what goes on in the human brain, but trusting the normative theory as a better standard for decisions in the meantime.
Reading through the comments of that post, I understood this to be the gist of the argument for why you would give up the $100:
Before knowing the outcome of the coin flip, you would have taken the wager to pay $100 for a 50% chance to win $10000. Alternatively, if Omega had asked you to “precommit” $100 in case you lost, you would still agree—its nearly exactly the same thing. (Technically it’s an even better wager.) What if Omega asks you to precommit a witless future self? You would like to pre-commit your future self.
So you, your current self, while trying to decide whether to pay Omega or not, have decided that you would actually like to precommit a future self to paying the $100. How do you do that? By being that future person in the present and committing your current self to pay the $100. Indeed you lost, but being consistent with “being a payer” is what you decided you wanted.