Wei Dai comments on Self-modification is the correct justification for updateless decision theory

Wei Dai 23 Aug 2010 0:19 UTC
2 points
(Sorry about the late reply. I’m not sure how I missed this post.)

Suppose you’re right and we do want to build an AI that would not press the button in this scenario. How do we go about it?
1. We can’t program “the umpteenth digit of pi is odd” into the AI as an axiom, because we don’t know this scenario will occur yet.
2. We also can’t just tell the AI “I am conscious and I have observed Alpha Centari as not purple”, because presumably when Omega was predicting the AI’s decision a million years ago, it was predicting the AI’s output when given “I am conscious and I have observed Alpha Centari as not purple” as part of its input.
3. What we can do is, give the AI an utility function that does not terminally value beings who are living in a universe with a purple Alpha Centauri.
Do you agree with the above reasoning? If so, we can go on to talk about whether doing 3 is a good idea or not. Or do you have some other method in mind?

BTW, I find it helpful to write down such problems as world programs so I can see the whole structure at a glance. This is not essential to the discussion, but if you don’t mind I’ll reproduce it here for my own future reference.
```
def P():
    if IsEven(Pi(10^100)):
        if OmegaPredict(S, "Here's a button... Alpha Centauri does not look purple.") = "press":
            MakeAlphaCentauriPurple()
        else:
            DestroyEarth()
    else:
        LetUniverseRun(10^6 years)
        if S("Here's a button... Alpha Centauri does not look purple.") = "press":
            DestroyEarth()

    LetUniverseRun(forever)
```
Then, assuming our AI can’t compute Pi(10^100), we have:
- U(“press”) = .5 * U(universe runs forever with Alpha Centauri purple) + 0.5 * U(universe runs for 10^6 years then Earth is destroyed)
- U(“not press”) = .5 * U(Earth is destroyed right away) + 0.5 * U(universe runs forever)
And clearly U(“not press”) > U(“press”) if U(universe runs forever with Alpha Centauri purple) = U(Earth is destroyed right away) = 0.
- Benya 23 Aug 2010 3:06 UTC
  0 points
  Parent
  Thanks for your answer! First, since it’s been a while since I posted this: I’m not sure my reasoning in this post is correct, but it does still seem right to me. I’d now gloss it as, in a Counterfactual Mugging there really is a difference as to the best course of action given your information yesterday and your information today. Yes, acting time-inconsistently is bad, so by all means, do decide to be a timeless decider; but this does not make paying up ideal given what you know today, choosing according to yesterday’s knowledge is just the best of the bad alternatives. (Choosing according to what a counterfactual you would have known a million years ago, OTOH, does not seem the best of the bad alternatives.)
  
  That said, to answer your question—if we can assume for the purpose of the thought experiment that we know the source code of the universe, what would seem natural to me would be to program UDT’s “mathematical intuition module” to assign low probability to the proposition that this source code would output a purple Alpha Centauri.
  
  Which is—well—a little fuzzy, I admit, because we don’t know how the mathematical intuition module is supposed to work, and it’s not obvious what it should mean to tell it that a certain proposition (as opposed to a complete theory) should have low probability. But if we can let logical inference and “P is false” stand in for probability and “P is improbable,” we’d tell the AI “the universe program does NOT output a purple Alpha Centauri,” and by simple logic the AI would conclude IsOdd(Pi(10^100)).
  What links here?
  - Benya's comment on Self-modification is the correct justification for updateless decision theory by Benya (23 Aug 2010 3:26 UTC; 3 points)