marchdown comments on AI box: AI has one shot at avoiding destruction—what might it say?

marchdown 24 Jan 2013 10:03 UTC
3 points
0
I’m not so sure that AI suggesting murder is clear evidence of it being unfriendly. After all, it can have a good reason to believe that if it doesn’t stop a certain researcher ASAP and at all costs, then humanity is doomed. One way around that is to give infinite positive value to human life, but can you really expect CEV to be handicapped in such a manner?
- The Dao of Bayes 24 Jan 2013 20:39 UTC
  3 points
  0
  Parent
  p(UFAI) > p(Imminent, undetected catastrophe that only a FAI can stop)
  
  Given UFAI results in “human extinction”, and my CEV assigns effectively infinite DISutility to that outcome, it would have to FIRST provide sufficient evidence for me to update to the catastrophe being more likely.
  
  I’ve already demonstrated that an AI which can do exactly that will get more leniency from me :)