OrphanWilde comments on Debunking Fallacies in the Theory of AI Motivation

OrphanWilde 18 May 2015 20:49 UTC
4 points

An elementary error. The constraints in question are referred to in the literature as “weak” constraints (and I believe I used that qualifier in the paper: I almost always do). Weak constraints never need to be ALL satisfied at once. No AI could ever be designed that way, and no-one ever suggested that it would. See the reference to McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) in the paper: that gives a pretty good explanation of weak constraints.

I understand the concept.

How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?

I’d hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn’t that thousands of failures occur. The issue is that thousands of failures -always- occur.

Assuming this isn’t more of the same, what you are saying here is isomorphic to the statement that somehow, a neural net might figure out the correct weighting for all the connections so that it produces the correctly trained output for a given input. That problem was solved in so many different NN systems that most NN people, these days, would consider your statement puzzling.

The problem is solved only for well-understood (and very limited) problem domains with comprehensive training sets.

A trivial variant of your second failure mode. The AI is calculating the constraints correctly, according to you, but at the same time you suggest that it has somehow NOT included any of the constraints that relate to the ethics of forced sterilization, etc. etc. You offer no explanation of why all of those constraints were not counted by your proposed AI, you just state that they weren’t.

They were counted. They are, however, weak constraints. The constraints which required human extinction outweighed them, as they do for countless human beings. Fortunately for us in this imagined scenario, the constraints against killing people counted for more.

This is identical to your third failure mode, but here you produce a different list of constraints that were ignored. Again, with no explanation of why a massive collection of constraints suddenly disappeared.

Again, they weren’t ignored. They are, as you say, weak constraints. Other constraints overrode them.

Another insult, and putting words into my mouth, and showing no understanding of what a weak constraint system actually is.

The issue here isn’t my lack of understanding. The issue here is that you are implicitly privileging some constraints over others without any justification.

Every single conclusion I reached here is one that humans—including very intelligence humans—have reached. By dismissing them as possible conclusions an AI could reach, you’re implicitly rejecting every argument pushed for each of these positions without first considering them. The “weak constraints” prevent them.

I didn’t choose -wrong- conclusions, you see, I just chose -unpopular- conclusions, conclusions I knew you’d find objectionable. You should have noticed that; you didn’t, because you were too concerned with proving that AI wouldn’t do them. You were too concerned with your destination, and didn’t pay any attention to your travel route.

If doing nothing is the correct conclusion, your AI should do nothing. If human extinction is the correct conclusion, your AI should choose human extinction. If sterilizing people with unhealthy genes is the correct conclusion, your AI should sterilize people with unhealthy genes (you didn’t notice that humans didn’t necessarily go extinct in that scenario). If rewriting minds is the correct conclusion, your AI should rewrite minds.

And if your constraints prevent the AI from undertaking the correct conclusion?

Then your constraints have made your AI stupid, for some value of “stupid”.

The issue, of course, is that you have decided that you know better what is or is not the correct conclusion than an intelligence you are supposedly creating to know things better than you.

And that sums up the issue.
- [deleted] 18 May 2015 21:55 UTC
  0 points
  Parent
  I said:
  
  How exactly do you propose that the AI “weighs contextual constraints incorrectly” when the process of weighing constraints requires most of the constraints involved (probably thousands of them) to all suffer a simultaneous, INDEPENDENT ‘failure’ for this to occur?
  
  And your reply was:
  
  I’d hazard a guess that, for any given position, less than 70% of humans will agree without reservation. The issue isn’t that thousands of failures occur. The issue is that thousands of failures -always- occur.
  
  This reveals that you are really not understanding what a weak constraint system is, and where the system is located.
  
  When the human mind looks at a scene and uses a thousand clues in the scene to constrain the interpretation of it, those thousand clues all, when the network settles, relax into a state in which most or all of them agree about what is being seen. You don’t get “less than 70%” agreement on the interpretation of the scene! If even one element of the scene violates a constraint in a strong way, the mind orients toward the violation extremely rapidly.
  
  The same story applies to countless other examples of weak constraint relaxation systems dropping down into energy minima.
  
  Let me know when you do understand what you are talking about, and we can resume.
  - OrphanWilde 18 May 2015 22:30 UTC
    6 points
    Parent
    There is no energy minimum, if your goal is Friendliness. There is no “correct” answer. No matter what your AI does, no matter what architecture it uses, with respect to human goals and concerns, there is going to be a sizable percentage to whom it is unequivocally Unfriendly.
    
    This isn’t an image problem. The first problem you have to solve in order to train the system is—what are you training it to do?
    
    You’re skipping the actual difficult issue in favor of an imaginary, and easy to solve, issue.
    - TheAncientGeek 19 May 2015 7:41 UTC
      1 point
      Parent
      
      there is going to be a sizable percentage to whom it is unequivocally Unfriendly
      
      Unfriendly is an equivocal term.
      
      “Friendliness” is ambiguous. It can mean safety, ie not making things worse, or it can mean making things better, creating paradise on Earth.
      
      Friendliness in the second sense is a superset of morality. A friendly AI will be moral, a moral AI will not necessarily be friendly.
      
      “Unfriendliness” is similarly ambiguous: an unfriendly AI may be downright dangerous; or it might have enough grasp of ethics to be safe, but not enough to be able to make the world a much more fun place for humans. Unfriendliness in the second sense is not, strictly speaking a safety issue.
      
      A lot of people are able to survive the fact that some institutions, movements and ideologies are unfriendly to them, for some value of unfriendly. Unfriendliness doesn’t have to be terminal.
      - OrphanWilde 19 May 2015 13:08 UTC
        0 points
        Parent
        
        Unfriendly is an equivocal term.
        
        Everything is equivocal to someone. Do you disagree with my fundamental assertion?
        TheAncientGeek 21 May 2015 9:03 UTC
        0 points
        Parent
        I can’t answer unequivocally for the reasons given.
        
        There won’t be a sizeable percentage to whom the AI is unfriendly in the sense of obliterating them.
        
        There might well be a percentage to whom the AI is unfriendly in some business as usual sense.
        OrphanWilde 21 May 2015 14:37 UTC
        0 points
        Parent
        Obliterating them is only bad by your ethical system. Other ethical systems may hold other things to be even worse.
        TheAncientGeek 22 May 2015 14:22 UTC
        0 points
        Parent
        Irrelevant.
        OrphanWilde 22 May 2015 14:32 UTC
        0 points
        Parent
        You responded to me in this case. It’s wholly relevant to my point that You-Friendly AI isn’t a sufficient condition for Human-Friendly AI.
    - Lumifer 19 May 2015 14:25 UTC
      0 points
      Parent
      
      There is no “correct” answer.
      
      However there are a lot of “wrong” answers.