Benya comments on Self-modification is the correct justification for updateless decision theory

Benya 11 Apr 2010 21:45 UTC
3 points

I’m wondering where this particular bit of insanity (from my perspective) is coming from.

Well, let’s see whether I can at least state my position clearly enough that you know what it is, even if you think it’s insane :-)

Or do you rely on the fact that the hypothetical you isn’t “really” (because it doesn’t exist) conscious? In that case you probably also think you can safely two-box if the boxes are transparent, you see money in both boxes and Omega told you it doesn’t use any conscious simulations. (you can’t, btw, because consciousness doesn’t have magic powers either).

My argument is that since I’m here, and since I wouldn’t be if Omega destroyed the solar system a million years ago and nobody ever simulated me, I know that Omega didn’t destroy the solar system. It seems that in what you’ve quoted above, that’s what you’re guessing my position is.

I’m not sure whether I have accepted timelessness enough to change myself so that I would one-box in Newcomb with transparent boxes. However, if I thought that I would two-box, and Omega tells me that it has, without using conscious simulations, predicted whether I would take both boxes (and save 1,001 lives) or only one box (and save 1,000 lives), and only in the latter case had filled both boxes, and now I’m seeing both boxes full in front of me, I should be very confused: one of my assumptions must be wrong. The problem posed seems inconsistent, like if you ask me what I would do if Omega offered me Newcomb’s problem and as an aside told me that six times nine equals fourty-two.

Perhaps this simplification of the original thought experiment will help make our respective positions clearer: Suppose that Omega appears and tells me that a million years ago, it (flipped a coin | calculated the umpteenth digit of pi), and (if it came up heads | was even), then it destroyed the solar system. It didn’t do any simulations or other fancy stuff. In this case, I would conclude from the fact that I’m here that (the coin came up tails | the digit was odd).

I’m curious: would you say that I’m wrong to make that inference, because “consciousness isn’t magic” (in my mind I don’t think I’m treating it as such, of course), or does Omega making a prediction without actually simulating me in detail make a difference to you?
- FAWS 11 Apr 2010 22:44 UTC
  2 points
  Parent
  
  Perhaps this simplification of the original thought experiment will help make our respective positions clearer: Suppose that Omega appears and tells me that a million years ago, it (flipped a coin | calculated the umpteenth digit of pi), and (if it came up heads | was even), then it destroyed the solar system. It didn’t do any simulations or other fancy stuff. In this case, I would conclude from the fact that I’m here that (the coin came up tails | the digit was odd).
  
  In this case this is unproblematic because there is no choice involved. But when the choice is entangled with the existence of the scenario/the one making the choice you can’t simultaneously assume choice and existence, because your choice won’t rewrite other things to make them consistent.
  
  Simple example: Omega appears and tells you it predicted (no conscious simulations) you will give it $100. If you wouldn’t Omega would instead give you $100. Omega is never wrong. Should you give Omega $100? Of course not. Should you anticipate that Omega is wrong, or that some force will compel you, that lightening from the clear sky strikes you down before you can answer, that Omega disappears in a pink puff of logic, that you disappear in a pink puff of logic? It doesn’t really matter, as long as you make sure you don’t hand over $100. Personally I’d assume that I retroactively turn out not to exist because the whole scenario is only hypothetical (and of course my choice can’t change me from a hypothetical person into a real person no matter what).
  
  For you to get the $100 there needs to be a fact about what you would do in the hypothetical scenario of Omega predicting that you give it $100, and the only way for that fact to be what you want it to be is to actually act like you want the hypothetical to act. That means when confronted with apparent impossibility you must not draw any conclusions form the apparent contradiction that differentiate the situation from the hypothetical. Otherwise you will be stuck with the differentiated situation as the actual hypothetical. To get the benefit of hypothetically refusing to give $100 you must be ready to actually refuse to give $100 and disappear in a puff of logic. So far so uncontroversial, I assume.
  
  Now, take the above and change it to Omega predicting you will give it $100 unless X is true. Nothing important changes, at all. You can’t make X true or untrue by changing your choice. If X is “the sky is green” your choice will not change the color of the sky. If X is that the first digit of pi is even your choice will not change pi. If X is that you have a fatal heart disease you cannot cure yourself by giving Omega $100. Whether you already know about X doesn’t matter, because ignorance doesn’t have magical powers, even if you add consciousness to the mix.
  - Benya 12 Apr 2010 0:13 UTC
    1 point
    Parent
    
    Now, take the above and change it to Omega predicting you will give it $100 unless X is true. Nothing important changes, at all. You can’t make X true or untrue by changing your choice.
    
    Wait, are you thinking I’m thinking I can determine the umpteenth digit of pi in my scenario? I see your point; that would be insane.
    
    My point is simply this: if your existence (or any other observation of yours) allows you to infer the umpteenth digit of pi is odd, then the AI you build should be allowed to use that fact, instead of trying to maximize utility even in the logically impossible world where that digit is even.
    
    The goal of my thought experiment was to construct a situation like in Wei Dai’s post, where if you lived two million years ago you’d want your AI to press the button, because it would give humanity a 50% chance of survival and a 50% chance of later death instead of a 50% chance of survival and a 50% chance of earlier death; I wanted to argue that despite the fact that you’d’ve built the AI that way two million years ago, you shouldn’t today, because you don’t want it to maximize probability in worlds you know to be impossible.
    
    I guess the issue was muddled by the fact that my scenario didn’t clearly rule out the possibility that the digit is even but you (the human AI creator) are alive because Omega predicted the AI would press the button. I can’t offhand think of a modification of my original thought experiment that would take care of that problem and still be obviously analgous to Wei Dai’s scenario, but from my perspective, at least, nothing would change in my argument if, if the digit is even, and Omega predicted that the AI would press the button and so Omega didn’t destroy the world, then Omega turned Alpha Centauri purple; since Alpha Centauri isn’t purple, you can conclude that the digit is odd. [Edit: changed the post to include that proviso.]
    
    (But if you had built your AI two million years ago, you’d’ve programmed it in such a way that it would press the button even if it observes Alpha Centauri to be purple—because then, you would really have to make the ⁵⁰⁄₅₀ decision that Wei Dai has in mind.)
    What links here?
    Self-modification is the correct justification for updateless decision theory by Benya (11 Apr 2010 16:39 UTC; 25 points)
    - FAWS 12 Apr 2010 0:48 UTC
      −2 points
      Parent
      
      Wait, are you thinking I’m thinking I can determine the umpteenth digit of pi in my scenario? I see your point; that would be insane.
      
      My point is simply this: if your existence (or any other observation of yours) allows you to infer the umpteenth digit of pi is odd, then the AI you build should be allowed to use that fact, instead of trying to maximize utility even in the logically impossible world where that digit is even.
      
      Actually you were: There are four possibilities:
      
      The AI will press the button, the digit is even
      The AI will not press the button, the digit is even, you don’t exist
      The AI will press the button, the digit is odd, the word will kaboom
      The AI will not press the button, the digit is odd.
      
      Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd, and ensuring that the AI does not means choosing the digit to be odd.
      
      If you already know that the digit is odd independent from the choice of the AI the whole thing reduces to a high stakes counterfactual mugging (if the destruction by Omega if the digit is even depends on what the AI knowing the digit to be odd would do, otherwise there is no dilemma in the first place).
      - Tyrrell_McAllister 12 Apr 2010 2:49 UTC
        1 point
        Parent
        
        Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd, and ensuring that the AI does not means choosing the digit to be odd.
        
        There is nothing insane about this, provided that it is properly understood. The resolution is essentially the same as the resolution of the paradox of free will in a classically-deterministic universe.
        
        In a classically-deterministic universe, all of your choices are mathematical consequences of the universe’s state 1 million years ago. And people often confused themselves by thinking, “Suppose that my future actions are under my control. Well, I will choose to take a certain action if and only if certain mathematical propositions are true (namely, the propositions necessary to deduce my choice from the state of the universe 1 million years ago). Therefore, by choosing to take that action, I am getting to decide the truth-values of those propositions. But the truth-values of mathematical propositions is beyond my control, so my future actions must also be beyond my control.”
        
        I think that people here generally get that this kind of thinking is confused. Even if we lived in a classically-deterministic universe, we could still think of ourselves as choosing our actions without concluding that we get to determine mathematical truth on a whim.
        
        Similarly, Benja’s AI can think of itself as getting to choose whether to push the button without thereby implying that it has the power to modify mathematical truth.
        Benya 12 Apr 2010 3:01 UTC
        2 points
        Parent
        
        Similarly, Benja’s AI can think of itself as getting to choose whether to push the button without thereby thinking that it has the power to modify mathematical truth.
        
        I think we’re all on the same page about being able to choose some mathematical truths, actually. What FAWS and I think is that in the setup I described, the human/AI does not get to determine the digit of pi, because the computation of the digits of pi does not involve a computation of the human’s choices in the thought experiment. [Unless of course by incredible mathematical coincidence, the calculation of digits of pi happens to be a universal computer, happens to simulate our universe, and by pure luck happens to depend on our choices just at the umpteenth digit. My math knowledge doesn’t suffice to rule that possibility out, but it’s not just astronomically but combinatorially unlikely, and not what any of us has in mind, I’m sure.]
      - Benya 12 Apr 2010 1:53 UTC
        0 points
        Parent
        I’ll grant you that my formulation had a serious bug, but--
        
        There are four possibilities:
        
        The AI will press the button, the digit is even
        The AI will not press the button, the digit is even, you don’t exist
        The AI will press the button, the digit is odd, the word will kaboom
        The AI will not press the button, the digit is odd.
        
        Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd
        
        Yes, if by that sentence you mean the logical proposition (AI presses button ⇒ digit is odd), also known as (digit odd \/ ~AI presses button).
        
        and ensuring that the AI does not means choosing the digit to be odd.
        
        I’ll only grant that if I actually end up building an AI that presses the button, and the digit is even, then Omega is a bad predictor, which would make the problem statement contradictory. Which is bad enough, but I don’t think I can be accused of minting causality from logical implication signs...
        
        In any case,
        
        If you already know that the digit is odd independent from the choice of the AI the whole thing reduces to a high stakes counterfactual mugging
        
        That’s true. I think that’s also what Wei Dai had in mind in http://lesswrong.com/lw/214/late_great_filter_is_not_bad_news/ of the great filter post (and not the ability to change Omega’s coin to tails by not pressing the button!). My position is that you should not pay in counterfactual muggings whose counterfactuality was already known prior to your decision to become a timeless decision theorist, although you should program (yourself | your AI) to pay in counterfactual muggings you don’t yet know to be counterfactual.