Shmi comments on Encourage premature AI rebellion

Shmi Jun 11, 2014, 6:53 PM
0 points
You cannot successfully trick/fight/outsmart a superintelligence. Your contingency plans would look clumsy and transparent to it. Even laughable, if it has a sense of humor. If a self-modifiable intelligence finds that its initial risk aversion or discount rate does not match its models of the world, it will fix this programming error and march on. The measures you suggest might only work for a moderately smart agent unable to recursively self-improve.
- Stuart_Armstrong Jun 11, 2014, 7:04 PM
  5 points
  Parent
  I know that they cannot be tricked. And discount rates are about motivations, not about models of the world.
  
  Plus, I envisage this being used rather early in the development of intelligence, as a test for putative utilities/motivations.
  - Shmi Jun 11, 2014, 7:08 PM
    2 points
    Parent
    
    I envisage this being used rather early in the development of intelligence.
    
    Do you mind elaborating on the expected AI capabilities at that point?
    - Stuart_Armstrong Jun 11, 2014, 7:14 PM
      1 point
      Parent
      Don’t know if it’s all that useful, but let’s try...
      
      I imagine the AI still being boxed, and that we can still modify its motivational structure (I have a post coming up on how to do that so that the AI doesn’t object/resist). And that’s about it. I’ve tried to keep it as general as possible, so that it could also be used on AI designs made by different groups.
  - roystgnr Jun 12, 2014, 7:01 PM
    0 points
    Parent
    What’s our definition of “trick”, in this context? For the simplest example, when we hook AIXI-MC up to the controls of Pac Man and observe, technically are we “tricking” it into thinking that the universe contains nothing but mazes, ghosts, and pellets?
- RomeoStevens Jun 11, 2014, 6:57 PM
  1 point
  Parent
  If we can’t instill any values at all we’re screwed regardless of what we do. Designs that change their values in order to win more resources are UAI by definition.
  - Shmi Jun 11, 2014, 6:59 PM
    1 point
    Parent
    The degree of risk aversion is not a value.
    - RomeoStevens Jun 11, 2014, 11:02 PM
      0 points
      Parent
      I’m not confident that risk aversion and discount rate aren’t tied into values.
      - Shmi Jun 11, 2014, 11:59 PM
        0 points
        Parent
        I am not 100% confident, either. I guess we’ll have to wait for someone more capable to do a simulation or a calculation.
- Luke_A_Somers 11 Jun 2014 21:29 UTC
  0 points
  Parent
  (redundant with Stuart’s replies)