TimFreeman comments on Newcomb’s Problem and Regret of Rationality

TimFreeman 17 May 2011 0:09 UTC
10 points
You said:

Causal decision theorists don’t self-modify to timeless decision theorists. If you get the decision theory wrong, you can’t rely on it repairing itself.

but you also said:

...if you build an AI that two-boxes on Newcomb’s Problem, it will self-modify to one-box on Newcomb’s Problem, if the AI considers in advance that it might face such a situation.

I can envision several possibilities:
- Perhaps you changed your mind and presently disagree with one of the above two statements.
- Perhaps you didn’t mean a causal AI in the second quote. In that case I have no idea what you meant.
- Perhaps Newcomb’s problem is the wrong example, and there’s some other example motivating TDT that a self-modifying causal agent would deal with incorrectly.
- Perhaps you have a model of causal decision theory that makes self-modification impossible in principle. That would make your first statement above true, in a useless sort of way, so I hope you didn’t mean that.
Would you like to clarify?
- Eliezer Yudkowsky 17 May 2011 0:54 UTC
  12 points
  Parent
  Causal decision theorists self-modify to one-box on Newcomb’s Problem with Omegas that looked at their source code after the self-modification took place; i.e., if the causal decision theorist self-modifies at 7am, it will self-modify to one-box with Omegas that looked at the code after 7am and two-box otherwise. This is not only ugly but also has worse implications for e.g. meeting an alien AI who wants to cooperate with you, or worse, an alien AI that is trying to blackmail you.
  
  Bad decision theories don’t necessarily self-repair correctly.
  
  And in general, every time you throw up your hands in the air and say, “I don’t know how to solve this problem, nor do I understand the exact structure of the calculation my computer program will perform in the course of solving this problem, nor can I state a mathematically precise meta-question, but I’m going to rely on the AI solving it for me ’cause it’s supposed to be super-smart,” you may very possibly be about to screw up really damned hard. I mean, that’s what Eliezer-1999 thought you could say about “morality”.
  - TimFreeman 17 May 2011 3:42 UTC
    0 points
    Parent
    Okay, thanks for confirming that Newcomb’s problem is a relevant motivating example here.
    
    “I don’t know how to solve this problem, nor do I understand the exact structure of the calculation my computer program will perform in the course of solving this problem, nor can I state a mathematically precise meta-question, but I’m going to rely on the AI solving it for me ’cause it’s supposed to be super-smart,”
    
    I’m not saying that. I’m saying that self-modification solves the problem, assuming the CDT agent moves first, and that it seems simple enough that we can check that a not-very-smart AI solves it correctly on toy examples. If I get around to attempting that, I’ll post to LessWrong.
    
    Assuming the CDT agent moves first seems reasonable. I have no clue whether or when Omega is going to show up, so I feel no need to second-guess the AI about that schedule.
    
    (Quoting out of order)
    
    This is not only ugly...
    
    As you know, we can define a causal decision theory agent in one line of math. I don’t know a way to do that for TDT. Do you? If TDT could be concisely described, I’d agree that it’s the less ugly alternative.
    
    but also has worse implications for e.g. meeting an alien AI who wants to cooperate with you, or worse, an alien AI that is trying to blackmail you.
    
    I’m failing to suspend disbelief here. Do you have motivating examples for TDT that seem likely to happen before Kurzweil’s schedule for the Singularity causes us to either win or lose the game?
    - Wei Dai 17 May 2011 19:29 UTC
      2 points
      Parent
      
      As you know, we can define a causal decision theory agent in one line of math.
      
      If you appreciate simplicity/elegance, I suggest looking into UDT. UDT says that when you’re making a choice, you’re deciding the output of a particular computation, and the consequences of any given choice are just the logical consequences of that computation having that output.
      
      CDT in contrast doesn’t answer the question “what am I actually deciding when I make a decision?” nor does it answer “what are the consequences of any particular choice?” even in principle. CDT can only be described in one line of math because the answer to the latter question has to be provided to it via an external parameter.
      - TimFreeman 17 May 2011 19:41 UTC
        0 points
        Parent
        Thanks, I’ll have a look at UDT.
        
        CDT can only be described in one line of math because the answer to the latter question has to be provided to it via an external parameter.
        
        I certainly agree there.
    - Kutta 18 May 2011 21:16 UTC
      0 points
      Parent
      
      If TDT could be concisely described, I’d agree that it’s the less ugly alternative.
      
      Maybe this one: “Argmax[A in Actions] in SumO in Outcomes*P(this computation yields A []-> O|rest of universe)”
      
      From this post.
    - FAWS 17 May 2011 20:15 UTC
      0 points
      Parent
      
      but also has worse implications for e.g. meeting an alien AI who wants to cooperate with you, or worse, an alien AI that is trying to blackmail you.
      
      I’m failing to suspend disbelief here. Do you have motivating examples for TDT that seem likely to happen before Kurzweil’s schedule for the Singularity causes us to either win or lose the game?
      
      I’m reasonably sure Eliezer meant implications for the would-be friendly AI meeting alien AIs. That could happen at any time in the remaining life span of the universe.