TheAncientGeek comments on Learning values versus learning knowledge

TheAncientGeek 16 Sep 2016 15:25 UTC
1 point
Are you saying the AI will rewrite its goals to make them easier, or will just not be motivated to fill in missing info?

In the first case, why wont it go the whole hog and wirehead? Which is to say, that any AI which is does anything except wireheading will be resistant to that behaviour—it is something that needs to be solved, and which we can assume has been solved in a sensible AI design.

When we programmed it to “create chocolate bars, here’s an incomplete definition D”, what we really did was program it to find the easiest thing to create that is compatible with D, and designate them “chocolate bars”.

If you programme it with incomplete info, and without any goal to fill in the gaps, then it will have the behaviour you mention...but I’m not seeing the generality. There are many other ways to programme it.

“if the AI is so smart, why would it do stuff we didn’t mean?” and “why don’t we just make it understand natural language and give it instructions in English?”

An AI that was programmed to attempt to fill in gaps in knowledge it detected, halt if it found conflicts, etc would not behave they way you describe. Consider the objection as actually saying:

“Why has the AI been programmed so as to have selective areas of ignorance and stupidity, which are immune from the learning abilities it displays elsewhere?”

PS This has been discussed before, see

http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/

and

http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/

see particularly

http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/ccpn
- Stuart_Armstrong 19 Sep 2016 10:59 UTC
  2 points
  Parent
  
  An AI that was programmed to attempt to fill in gaps in knowledge it detected, halt if it found conflicts, etc would not behave they way you describe.
  
  We don’t know how to program a foolproof method of “filling in the gaps” (and a lot of “filling in the gaps” would be a creative process rather that a mere learning one, such as figuring out how to extend natural language concepts to new areas).
  
  And it helps it people speak about this problem in terms of coding, rather than high level concepts, because all the specific examples people have ever come up with for coding learning, have had these kind of flaws. Learning natural language is not some sort of natural category.
  
  Coding learning with some imperfections might be ok if the AI is motivated to merely learn, but is positively pernicious if the AI has other motivations as to what to do with that learning (see my post here for a way of getting around it: https://agentfoundations.org/item?id=947 )
  - TheAncientGeek 19 Sep 2016 13:02 UTC
    −2 points
    Parent
    
    We don’t know how to program a foolproof method of “filling in the gaps” (and a lot of “filling in the gaps” would be a creative process rather that a mere learning one, such as figuring out how to extend natural language concepts to new areas).
    
    Inasmuch as that is relying on the word “foolproof”, it is proving much too much., since we barely have foolproof methods to do anything.
    
    The thing is that your case needs to be argued from consistent and fair premises..where “fair” means that your opponents are allowed to use them.
    
    If you are assuming that an AI has sufficiently advanced linguistic abilities to talk its way out of a box, then your opponents are entitled to assume that the same level of ability could be applied to understanding verbally specified goals.
    
    If you are assuming that it is limitation of ability that is preventing the AI from understanding what “chocolate” means, then your opponents are entitled to assume it is weak enough to be boxable.
    
    And it helps it people speak about this problem in terms of coding, rather than high level concepts, because all the specific examples people have ever come up with for coding learning, have had these kind of flaws.
    
    What specific examples? Loosemore’s counterargument is in terms of coding. And I notice you don’t avoid NL arguments yourself.
    
    Coding learning with some imperfections might be ok if the AI is motivated to merely learn, but is positively pernicious if the AI has other motivations as to what to do with that learning (see my post here for a way of getting around it: https://agentfoundations.org/item?id=947 )
    
    I rather doubt that the combination of a learning goal, plus some other goal, plus imperfect ability is all that deadly, since we already have AI that are like that, and which haven’t killed us. I think you must be making some other assumptions, for instance that the AI is in some sort of “God” role, with an open-ended remit to improve human life.
    - Stuart_Armstrong 19 Sep 2016 13:28 UTC
      2 points
      Parent
      
      If you are assuming that an AI has sufficiently advanced linguistic abilities to talk its way out of a box, then your opponents are entitled to assume that the same level of ability could be applied to understanding verbally specified goals.
      
      They are entitled to assume they could be applied, not necessarily that they would be. At some point, there’s going to have to be something that tells the AI to, in effect, “use the knowledge and definitions in your knowledge base to honestly do X [X = some NL objective]”. This gap may be easy to bridge, or hard; no-one’s suggested any way of bridging it so far.
      
      It might be possible; it might be trivial. But there’s no evidence in that direction so far, and the designs that people have actually proposed have been disastrous. I’ll work at bridging this gap, and see if I can solve it to some level of approximation.
      
      And I notice you don’t avoid NL arguments yourself.
      
      Yes, which is why I’m stepping away from those argument to help bring clarity.
      - TheAncientGeek 19 Sep 2016 16:58 UTC
        1 point
        Parent
        
        They are entitled to assume they could be applied, not necessarily that they would be. At some point, there’s going to have to be something that tells the AI to, in effect, “use the knowledge and definitions in your knowledge base to honestly do X [X = some NL objective]”. This gap may be easy to bridge, or hard; no-one’s suggested any way of bridging it so far.
        
        There’s only a gap if you start from the assumption that a compartmentalised UF is in some way easy, natural or preferable. However, your side of the debate has never shown that.
        
        At some point, there’s going to have to be something that tells the AI to, in effect, “use the knowledge and definitions in your knowledge base to honestly do X [X = some NL objective]”.
        
        No...you don’t have to show a fan how to make a whirring sound… use of updatable knowledge to specify goals is a natural consequence of some designs.
        
        It might be possible; it might be trivial.
        
        You are assuming it is difficult, with little evidence.
        
        But there’s no evidence in that direction so far, and the designs that people have actually proposed have been disastrous.
        
        Designs that bridge a gap, or designs that intrinsically don’t have one?
        
        I’ll work at bridging this gap, and see if I can solve it to some level of approximation.
        
        Why not examine the assumption that there has to be a gap?
        Stuart_Armstrong 19 Sep 2016 18:03 UTC
        2 points
        Parent
        
        There’s only a gap if you start from the assumption that a compartmentalised UF is in some way easy, natural or preferable.
        
        ? Of course there’s a gap. The AI doesn’t start with full NL understanding. So we have to write the AI’s goals before the AI understands what the symbols mean.
        
        Even if the AI started with full NL understanding, we still would have to somehow program it to follow our NL instructions. And we can’t do that initial programming using NL, of course.
        TheAncientGeek 22 Sep 2016 17:03 UTC
        0 points
        Parent
        
        Of course there’s a gap. The AI doesn’t start with full NL understanding.
        
        Since you are talking in terms of a general counterargument, I don;t think you can appeal to a specific architecture.
        
        So we have to write the AI’s goals before the AI understands what the symbols mean.
        
        Which would be a problem if it designed to attempt to execute NL instructions without checking if it understands them...which is a bit clown car-ish. An AI that is capable of learning NL as it goes along is an AI that has gernal a goal to get language right. Why assume it would not care about one specific sentence?
        
        Even if the AI started with full NL understanding, we still would have to somehow program it to follow our NL instructions
        
        Y-e-es? Why assume “it needs to follow instructions” equates to “it would simplify the instructions it’s following” rather than something else?
- Stuart_Armstrong 22 Sep 2016 10:28 UTC
  0 points
  Parent
  First step towards formalising the value learning problems: http://lesswrong.com/r/discussion/lw/ny8/heroin_model_ai_manipulates_unmanipulatable_reward/ (note that, curcially, giving the AI more information does not make it more accurate, rather the opposite).