djm comments on Examples of AI’s behaving badly

djm 16 Jul 2015 14:42 UTC
5 points
These are all task specific problem definition issues that occurred while fine tuning algorithms (but yes they do show how things could get out of hand)

Humans already do this very well, for example tax loopholes that are exploited but are not in the ‘spirit of the law’.

The ideal (but incredibly difficult) solution would be for AI’s to have multiple layers of abstraction, where each decision gets passed up and is then evaluated as “is this really what they wanted”, or “am I just gaming the system”.
- Apteris 16 Jul 2015 21:51 UTC
  2 points
  Parent
  What happens if an AI manages to game the system despite the n layers of abstraction?
  - djm 16 Jul 2015 23:48 UTC
    0 points
    Parent
    This is the fundamental problem that is being researched—the top layer of abstraction would be that difficult to define one called “Be Friendly”.
    
    Instead of friendly AI maybe we should look at “dont be an asshole” AI (DBAAAI) - this may be simpler to test and monitor.
    - Apteris 17 Jul 2015 20:45 UTC
      1 point
      Parent
      Let me clarify why I asked. I think the “multiple layers of abstraction” idea is essentially “build in a lot of ‘manual’ checks that the AI isn’t misbehaving”, and I don’t think that is a desirable or even possible solution. You can write n layer of checks, but how do you know that you don’t need n+1?
      
      The idea being—as has been pointed out here on LW—that what you really want and need is a mathematical model of morality, which the AI will implement and which moral behaviour will fall out of without you having to specify it explicitly. This is what MIRI are working on with CEV & co.
      
      Whether or not CEV or whatever emerges as the best model to use are gameable is itself a mathematical question,[1] central to the FAI problem.
      
      [1] There are also implementation details to consider, e.g. “can I mess with the substrate” or “can I trust my substrate”.