Wei Dai comments on What do superintelligences really want? [Link]

Wei Dai 24 Jan 2011 22:40 UTC
7 points
Did Eliezer’s ever post his planned critique of AIXI? I’ve seen/heard Eliezer state his position on AIXI several times, but can’t locate a detailed argument.

Just now, I wanted to point the author of http://physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/ (“Even in our deepest theories of machine intelligence, the idea of reward comes up. There is a theoretical model of intelligence called AIXI, developed by Marcus Hutter...”) to it, but I couldn’t.

Perhaps the flaws of AIXI are obvious to most of us here by now, but somebody should probably still write them down...
- Zack_M_Davis 24 Jan 2011 23:07 UTC
  8 points
  Parent
  
  I’ve seen/heard Eliezer state his position on AIXI several times, but can’t locate a detailed argument.
  
  You may be thinking of a 2003 posting and ensuing discussion on the AGI mailing list, in which Yudkowsky argued that AIXI’s lack of reflectivity leaves it vulnerable in Prisoner’s Dilemma-type situations. Best wishes, the Less Wrong Reference Desk.
  - Wei Dai 25 Jan 2011 0:11 UTC
    0 points
    Parent
    Thanks, I’ll take that as confirmation that Eliezer never posted his planned critique on Less Wrong.
    
    in which Yudkowsky argued that AIXI’s lack of reflectivity leaves it vulnerable in Prisoner’s Dilemma-type situations
    
    That’s one problem with AIXI, but not directly relevant to the blog post XiXiDu and I linked to. I was thinking of a recent presentation I saw where the presenter said “It [AIXI] gets rid of all the humans, and it gets a brick, and puts it on the reward button.” and it turns out that was Roko, not Eliezer.
    
    A bit more searching reveals that I had actually made a version of this argument myself, here and here.
    - timtyler 25 Jan 2011 0:32 UTC
      1 point
      Parent
      
      I was thinking of a recent presentation I saw where the presenter said “It [AIXI] gets rid of all the humans, and it gets a brick, and puts it on the reward button.” and it turns out that was Roko, not Eliezer.
      
      Hutter has discussed AIXI wireheading several times, most recenly in his AGI-10 presentation—where he discusses wireheading in the Q & A at the end (01:03:00) - claiming that he can prove it won’t happen in some cases—but not all of them.
      
      Mostly he argues that it probably won’t do it—for the same reason that many humans don’t take drugs: the long-term rewards are low.
      
      Here’s a quote:
      
      Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human “teachers”, or by threatening them. This is a general sociological problem which successful AI will cause, which has nothing specifically to do with AIXI. Every intelligence superior to humans is capable of manipulating the latter. In the absence of manipulable humans, e.g. where the reward structure serves a survival function, AIXI may directly hack into its reward feedback. Since this will unlikely increase its long-term survival, AIXI will probably resist this kind of manipulation (like most humans don’t take hard drugs, due to their long-term catastrophic consequences).
      - timtyler 2 Feb 2011 0:14 UTC
        0 points
        Parent
        Marcus Hutter once wrote:
        
        Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human “teachers”, or by threatening them. This is a general sociological problem which successful AI will cause, which has nothing specifically to do with AIXI.
        
        These days, one might say: “this is a general sociological problem which pure reinforcement learning agents will cause—which illustrates why we should not build them.”
      - Wei Dai 25 Jan 2011 1:08 UTC
        0 points
        Parent
        
        Hutter has discussed AIXI wireheading several times, most recenly in his AGI-10 presentation.
        
        Thanks, I wasn’t aware that he had address the issue at all. When I made the argument to him in 2002, he didn’t respond to my post.
        
        Mostly he argues that it probably won’t do it—for the same reason that many humans don’t take drugs: the long-term rewards are low.
        
        After Googling for quote to see where it came from, I see that you refuted Hutter’s counter-argument yourself at http://alife.co.uk/essays/on_aixi/. (Why didn’t you link to it?) I agree with your counter-counter-argument.
        timtyler 25 Jan 2011 9:00 UTC
        0 points
        Parent
        I have another video on the topic as well (Superintelligent junkies) - but unfortunalely there’s no transcript for that one at the moment.
    - timtyler 25 Jan 2011 10:00 UTC
      0 points
      Parent
      
      Eliezer never posted his planned critique [of AIXI] on Less Wrong.
      
      Yes, that is correct.
- timtyler 25 Jan 2011 9:59 UTC
  0 points
  Parent
  Arguably Marcus Hutter’s AIXI should go in this category: for a mind of infinite power, it’s awfully stupid—poor thing can’t even recognize itself in a mirror.
  - Perplexed 25 Jan 2011 13:36 UTC
    1 point
    Parent
    And following some links from there leads to this 2003 Eliezer posting to an AGI mailing list in which he explains the mirror opinion.
    
    I can’t say I completely understood the argument, but it seemed that the real reason EY deprecates AIXI is that he fears that it would defect in the PD, even when playing against a mirror image—because it wouldn’t recognize the symmetry.
    
    I have to say that this habit of evaluating and grading minds based on how they perform on a cherry-picked selection of games (PD, Hitchhiker, Newcomb) leaves me scratching my head. For every game which makes some particular feature of a decision theory seem desirable (determinism, say, or ability to recognize a copy of yourself) there are other games where that feature doesn’t help, and even games which make that feature look undesirable. It seems to me that Eliezer is approaching decision theory in an amateurish and self-deluding fashion.
    - timtyler 25 Jan 2011 21:18 UTC
      2 points
      Parent
      
      And following some links from there leads to this 2003 Eliezer posting to an AGI mailing list in which he explains the mirror opinion.
      
      I can’t say I completely understood the argument, but it seemed that the real reason EY deprecates AIXI is that he fears that it would defect in the PD, even when playing against a mirror image—because it wouldn’t recognize the symmetry.
      
      Probably the two most obvious problems with AIXI (apart from the uncomputability business) are that it:
      
      Would be inclined to grab control of its own reward function—and make sure nobody got in the way of it doing that;
      
      Doesn’t know it has a brain or a body—and so might easily eat its own brains accidentally.
      
      I discuss these problems in more detail in my essay on the topic. Teaching it that it has a brain may not be rocket science.
    - wedrifid 25 Jan 2011 13:56 UTC
      0 points
      Parent
      
      It seems to me that Eliezer is approaching decision theory in an amateurish and self-deluding fashion.
      
      Given your analysis I concluded the reverse. It is ‘amateurish’ to not pay particular attention to the critical edge cases in your decision theory. Your conclusion of ‘self-delusion’ was utterly absurd.
      
      The Prisoner’s Dilemma. “Cherry Picked”? You can not be serious! It’s the flipping Prisoner’s Dilemma. It’s more or less the archetypal decision theory introduction to cooperation problems.