Gunnar_Zarncke comments on Tackling the subagent problem: preliminary analysis

Gunnar_Zarncke 13 Jan 2016 11:40 UTC
4 points
I’m not sure where to post an idea for AI control research so I do it here. It somehow spun off from your post, the recent treacherous turn post and LW slack discussions.

That is the idea: Could we gameify AI safety research? The approach would be to create a setting where the players have to obey the AI safety rules and still achieve an objective in the in-game world. This can be a simulated virtual world in a computer game or a role playing world. To get sufficient motivation the in-game world would e.g. consist of a population of evil (to a typical human player) beings that interact and your most likely purpose is to make them do things you want (as in many other computer games too). Try to squeeze out as much resources as you can. While still obeying the rules. The game would progress from simple AI control rules like Asimovs robot laws to more advanced AI control rules. And find out whether people can hack these. If people can an AI probably can too.
- SilentCal 14 Jan 2016 19:21 UTC
  3 points
  Parent
  That’s essentially what these posts are to me, except instead of a video game it’s pen-and-paper with Stuart Armstrong as DM :).
  
  It might be worth the extra motivation of writing up a framing with evil AI designers applying the proposed controls. I’ll consider doing this on future posts.
  - Gunnar_Zarncke 14 Jan 2016 21:28 UTC
    0 points
    Parent
    Awesome! Stuart Armstrong be our Dungeon Master! :-) I haven’t seen you write up your responses to our DM though. I’d like to see them.
    - SilentCal 15 Jan 2016 18:57 UTC
      2 points
      Parent
      I’ve made a few shots, e.g. at http://lesswrong.com/r/discussion/lw/mfq/presidents_asteroids_natural_categories_and/cjkr and http://lesswrong.com/lw/m25/high_impact_from_low_impact/cah1. There’s no explicit role-playing, but I was very much in the mindset of trying to break the protection scheme.
      
      I haven’t been keeping up with these posts as well lately.
- Lumifer 13 Jan 2016 15:59 UTC
  2 points
  Parent
  Dwarf Fortress..? X-D Or Angband Borg is you want programming.
  
  I think what you really want is a hacking game: here is a system that block you, try to subvert it. You can put on a black (or a grey) hat and play it in real life :-/
  - gjm 13 Jan 2016 16:35 UTC
    0 points
    Parent
    There are already hacking games of this sort (the usual term is “CTF”, for “capture the flag”) but they don’t capture any of what’s alleged to be different about AI safety compared with computer security more generally.
    - Lumifer 13 Jan 2016 16:40 UTC
      0 points
      Parent
      True. I suspect gamification of AI safety research might be fun but is unlikely to be actually useful.
      - Gunnar_Zarncke 13 Jan 2016 20:48 UTC
        0 points
        Parent
        I think that in absence of actual AI using humans is the best approximation you can get. And games with in-game reward seems to work well as a motivator. Men die for points.
        
        But yes, to put this to real use (but we may need all we can get) may require some more work.
        Lumifer 13 Jan 2016 21:38 UTC
        1 point
        Parent
        
        in absence of actual AI using humans is the best approximation you can get
        
        Humanity has been practicing trying to control and restrain humans (and vice versa, humans were practicing trying to escape and subvert control) for thousands of years.
        
        And games with in-game reward seems to work well as a motivator.
        
        Real life provides better motivation. No save points, y’know :-/
        Gunnar_Zarncke 13 Jan 2016 22:37 UTC
        0 points
        Parent
        Only that real-life is not structured in a way to make AI safety research natural for humans...
- Stuart_Armstrong 13 Jan 2016 11:52 UTC
  0 points
  Parent
  Possibly. I’ll keep it in mind; Jaan Tallinn is proposing some interesting programming challenges, and something like this might be able to fit in there...