Eliezer Yudkowsky comments on The Hidden Complexity of Wishes

Eliezer Yudkowsky 31 Aug 2013 21:39 UTC
3 points
The trouble is that communicating with a human or helping them build the real FAI in any way is going to strongly perturb the world. So actually getting anything useful this way requires solving the problem of which changes to humans, and consequent changes to the world, are allowed to result from your communication-choices.
- Kawoomba 31 Aug 2013 21:55 UTC
  0 points
  Parent
  
  helping them build the real FAI
  
  Except it’s not, as far as the artificial agent is concerned:
  
  Its goals are strictly limited to “develop your models using the minimal actions possible [even ‘just parse the internet, do not use anything beyond wget’ could suffice], after x number of years have passed, accept new goals from y source.” The new goals could be anything. (It could even be a boat!).
  
  The usefulness regarding FAI becomes evident only at that latter stage, stemming from the foom’ed AI’s models being used to parse the new goals of “do that which I’d want you to do”. It’s sidestepping the big problem (aka “cheating”), but so what?
  - Eliezer Yudkowsky 31 Aug 2013 23:28 UTC
    8 points
    Parent
    It’s allowed to emit arbitrary HTTP GETs? You just lost the game.
    - Kawoomba 1 Sep 2013 6:41 UTC
      1 point
      Parent
      Ah, you mean because you can invoke e.g. php functions with wget / inject SQL code, thus gaining control of other computers etc.?
      
      A more sturdy approach to just get data would be to only allow it to passively listen in on some Tier 1 provider’s backbone (no manipulation of the data flow other than mirroring packets, which is easy to formalize). Once that goal is formulated, the agent wouldn’t want to circumvent it.
      
      Still seems plenty easier to solve than “friendliness”, as is programming it to ask for new goals after x time. Maintaining invariants under self-modification remains, as a task.
      
      It’s not fruitful for me to propose implementations (even though I just did, heh) and for someone else to point out holes (I don’t mean to solve that task in 5 minutes), same as with you proposing full-fledged implementations for friendliness and for someone else to point out holes. Both are non-trivial tasks.
      
      My question is this: given your current interpretation of both approaches (“passively absorb data, ask for new goals after x time” vs. “implement friendliness in the pre-foomed agent outright”), which seems more manageable while still resulting in an FAI?