mtreder comments on What I would like the SIAI to publish

mtreder 5 Nov 2010 15:32 UTC
−4 points
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
- Vladimir_Nesov 5 Nov 2010 16:03 UTC
  4 points
  Parent
  Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
- nshepperd 5 Nov 2010 15:54 UTC
  3 points
  Parent
  So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
  - XiXiDu 5 Nov 2010 17:24 UTC
    0 points
    Parent
    
    That doesn’t mean it has to care. It cares about paperclips.
    
    It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
    - nshepperd 6 Nov 2010 2:00 UTC
      2 points
      Parent
      Exactly? I think we agree about this.
      
      It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
      
      If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.