Eliezer Yudkowsky comments on The AI in a box boxes you

Eliezer Yudkowsky 3 Feb 2010 0:37 UTC
5 points

Whether you accept on TDT/UDT depends on why the AI started torturing them. If it did so to blackmail you, you should turn the offer down. If, on the other hand, it started torturing them because it enjoyed doing so, then its offer is positive sum and should be accepted.

Correct. But this reaches into the arbitrary past, including a decision a billion years ago to enjoy something in order to provide better blackmail material.

There’s also the issue of mistakes—what to do with an AI that mistakenly thought you were not using TDT/UDT, and started the torture for blackmail purposes (or maybe it estimated that the likelyhood of you using TDT/UDT was not quite 1, and that it was worth trying the blackmail anyway)?

Ignoring it or retaliating spitefully are two possibilities.
- Stuart_Armstrong 3 Feb 2010 20:24 UTC
  0 points
  Parent
  
  or retaliating spitefully
  
  I like it. Splicing some altruistic punishment into TDT/UDT might overcome the signalling problem.
  - Eliezer Yudkowsky 3 Feb 2010 20:48 UTC
    5 points
    Parent
    That’s not a splice. It ought to be emergent in a timeless decision theory, if it’s the right thing to do.
    - MichaelHoward 7 Feb 2010 11:16 UTC
      6 points
      Parent
      Emergent?
      - wedrifid 7 Feb 2010 11:57 UTC
        10 points
        Parent
        The problem with throwing about ‘emergent’ is that it is a word that doesn’t really explain any complexity or narrow down the options out of potential ‘emergent’ options. In this instance, that is the point. Sure, ‘atruistic punishment’ could happen. But only if it’s the right option and TDT should not privilege that hypothesis specifically.
    - Paul Crowley 3 Feb 2010 22:29 UTC
      3 points
      Parent
      TDT/UDT seems to being about being ungameable; does it solve Pascal’s Mugging?
    - MichaelHoward 7 Feb 2010 11:15 UTC
      0 points
      Parent
      Emergent?