prase comments on The Friendly AI Game

prase 15 Mar 2011 17:18 UTC
9 points
Some villain then asks how to reliably destroy the world, and follows the given answer.

Alternatively: A philosopher asks for the meaning of life, and the Oracle returns an extremely persuasive answer which convinces most of people that life is worthless.

Another alternative: After years of excellent work, the Oracle gains so much trust that people finally start to implement a possibility to ask less formal questions, like “how to maximise human utility”, and then follow the given advice. Unfortunately (but not surprisingly), unnoticed mistake in the definition of human utility has slipped through the safety checks.
- AlexMennen 15 Mar 2011 22:54 UTC
  3 points
  Parent
  
  Unfortunately (but not surprisingly), unnoticed mistake in the definition of human utility has slipped through the safety checks.
  
  Yes, that’s the main difficulty behind friendly AI in general. This does not constitute a specific way that it could go wrong.
  - prase 16 Mar 2011 12:55 UTC
    2 points
    Parent
    Oh, sure. My only intention was to show that limiting the AI’s power to mere communication doesn’t imply safety. There may be thousands of specific ways how it could go wrong. For instance:
    
    The Oracle answers that human utility is maximised by wireheading everybody to become a happiness automaton, and that it is a moral duty to do that to others even against their will. Most people believe the Oracle (because its previous answers always proved true and useful, and moreover it makes a really neat PowerPoint presentations of its arguments) and wireheading becomes compulsory. After the minority of dissidents are defeated, all mankind turns into happiness automata and happily dies out a while later.