Logos01 comments on Why would an AI try to figure out its goals?

Logos01 9 Nov 2011 10:14 UTC
−2 points

But why would AIs behave the same way if they don’t think verbally?

Part of the problem, it appears to me, is that you’re ascribing a verbal understanding to a mechanical process. Consider; for AIs to have values those values must be ‘stored’ in a medium compatible with their calculations.

However, once an AI begins to ‘improve’ itself—that is, once an AI has as an available “goal” the ability to form better goals—then it’s going to base the decisions of what an improved goal is based on the goals and values it already has. This will cause it to ‘stabilize’ upon a specific set of higher-order values / goals.

Once the AI “decides” that becoming a better paperclip maker is something it values, it is going to value valuing making itself a better paperclip optimizer recursively in a positive feedback loop that will then anchor upon a specific position.

This can, quite easily, be expressed in mathematical / computational terms—though I am insufficient to the task of doing so.

A different way of viewing it is that once intentionality is introduced to assigning value, assigning value has an assigned value. Recursion of goal-orientation can then be viewed to produce a ‘gravity’ in then-existing values.

EDIT: To those of you downvoting, would you care to explain what you disagree with that is causing you to do so?
- JoshuaZ 9 Nov 2011 15:38 UTC
  3 points
  Parent
  I haven’t downvoted you, but I suspect that the downvotes are arising from two remarks:
  
  Part of the problem, it appears to me, is that you’re ascribing a verbal understanding to a mechanical process.
  
  This sentence seems off. It isn’t clear what is meant by mechanical in this context other than to shove through a host of implied connotations.
  
  Also:
  
  This can, quite easily, be expressed in mathematical / computational terms—though I am insufficient to the task of doing so.
  
  I could see this sentence as being a cause for downvotes. Asserting that something non-trivial can be put in terms of math when one can’t do so on one’s own and doesn’t provide a reference seems less than conducive to good discussion.
  - Logos01 9 Nov 2011 15:55 UTC
    0 points
    Parent
    
    This sentence seems off. It isn’t clear what is meant by mechanical in this context other than to shove through a host of implied connotations.
    
    Hrm. If I had used the word “procedural” rather than “mechanical”, would that have, do you think, prevented this impression?
    
    Asserting that something non-trivial can be put in terms of math when one can’t do so on one’s own and doesn’t provide a reference seems less than conducive to good discussion.
    
    If I am not a physicist, does that disqualify me from making claims about what a physicist would be relatively easily able to do? For example; “I’m not sufficient to the task of calculating my current relativistic mass—but anyone who works with general relativity would have no trouble doing this.”
    
    So what am I missing with this element? Because I genuinely cannot see a difference between “a mathematician / AI worker could express in mathematical or computational terms the nature of recursive selection pressure” and “a general relativity physicist could calculate my relativistic mass relative to the Earth” in terms of the exceptionalism of either claim.
    
    Is it perhaps that my wording appears to be implying that I meant more than “goals can be arranged in a graph of interdependent nodes that recursively update one another for weighting”?
    - JoshuaZ 9 Nov 2011 16:00 UTC
      2 points
      Parent
      Part of the reason why the sentence bothers me is that I’m a mathematician and it wasn’t obvious to me that there is a useful way of making the statement mathematically precise.
      
      Is it perhaps that my wording appears to be implying that I meant more than “goals can be arranged in a graph of interdependent nodes that recursively update one another for weighting”?
      
      So this is a little better and that may be part of it. Unfortunately, it isn’t completely obvious that this is true either. This is a property that we want goal systems to have in some form. It isn’t obvious that all goal systems in some broad sense will necessarily do so.
      - Logos01 9 Nov 2011 16:27 UTC
        0 points
        Parent
        
        It isn’t obvious that all goal systems in some broad sense will necessarily do so.
        
        “All” goal systems don’t have to; only some. The words I use to form this sentence do not comprise the whole of the available words of the English language—just the ones that are “interesting” to this sentence.
        
        It would seem implicit that any computationally-based artificial intelligence would have a framework for computing. If that AI has volition, then it has goals. As we’re already discussing, topically, a recursively improving AI, then it has volition; direction. So we see that it by definition has to have computable goals.
        
        Now, for my statement to be true—the original one that was causing the problems, that is—it’s only necessary that this be expressible in “mathematical / computational terms”. Those terms need not be practically useful—in much the same way that a “proof of concept” is not the same thing as a “finished product”.
        
        Additionally, I somewhat have trouble grappling with the rejection of that original statement given the fact that values can be defined about “beliefs about what should be”—and we already express beliefs in Bayesian terms as a matter of course on this site.
        
        What I mean here is, given the new goal of finding better ways for me to communicate to LWers—what’s the difference here? Why is it not okay for me to make statements that rest on commonly accepted ‘truths’ of LessWrong?
        
        Is it the admission of my own incompetence to derive that information “from scratch”? Is it my admission to a non-mathematically-rigorous understanding of what is mathematically expressible?
        
        (If it is that lattermore, then I find myself leaning towards the conclusion that the problem isn’t with me, but with the people who downvote me for it.)
        asr 9 Nov 2011 17:55 UTC
        5 points
        Parent
        I would downvote a comment that confidently asserted a claim of which I am dubious, when the author has no particular evidence for it, and admits to having no evidence for it.
        
        This applies even if many people share the belief being asserted. I can’t downvote a common unsupported belief, but I can downvote the unsupported expression of it.
- Thomas 9 Nov 2011 10:26 UTC
  2 points
  Parent
  
  you’re ascribing a verbal understanding to a mechanical process.
  
  Every pocess is a mechanical one.
  - Logos01 9 Nov 2011 10:29 UTC
    2 points
    Parent
    
    Every pocess is a mechanical one.
    
    Reductively, yes. But this is like saying “every biological process is a physical process”. While trivially true, it is not very informative. Especially when attempting to relate to someone that much of their problem in understanding a specific situation is that they are “viewing it from the wrong angle”.
- asr 9 Nov 2011 17:46 UTC
  −2 points
  Parent
  
  This can, quite easily, be expressed in mathematical / computational terms—though I am insufficient to the task of doing so.
  
  I am skeptical of this claim. I’m not at all convinced that it’s feasible to formalize “goal” or that if we could formalize it, the claim would be true in general. Software is awfully general, and I can easily imagine a system that has some sort of constraint on its self-modification, where that constraint can’t be self-modified away. I can also imagine a system that doesn’t have an explicit constraint on its evolution but that isn’t an ideal self-modifier. Humans, for instance, have goals and a limited capacity to self-modify, but we don’t usually see them become totally dedicated to any one goal.
  - Logos01 9 Nov 2011 17:53 UTC
    0 points
    Parent
    
    I am skeptical of this claim. I’m not at all convinced that it’s feasible to formalize “goal” or that if we could formalize it, the claim would be true in general.
    
    Would you agree that Bayesian Belief Nets can be described/expressed in the form of a graph of nodal points? Can you describe an intelligible reason why values should not be treated as “ought” beliefs (that is, beliefs about what should be)?
    
    Furthermore; why does it need to be general? We’re discussing a specific category of AI. Are you aware of any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?
    
    Humans, for instance, have goals and a limited capacity to self-modify, but we don’t usually see them become totally dedicated to any one goal.
    
    That’s not an accurate description of the scenario being discussed. We’re not discussing fixation upon a single value/goal but the fixation of specific SETS of goals.
    - asr 9 Nov 2011 23:38 UTC
      1 point
      Parent
      I can think of several good reasons why values might not be incorporated into a system as “ought” beliefs. If my AI isn’t very good at reasoning, I might, for instance, find it simpler to construct a black-box “does this action have consequence X” property-checker and incorporate that into the system somewhere. The rest of the system has no access to the internals of the black box—it just supplies a proposed course of action and gets back a YES or a NO.
      
      You ask whether there’s “any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?”
      
      Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That soft of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
      
      Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define. Among other reasons, I don’t know a good definition that separates ‘code’ and ‘data’. So if you don’t like the example above, you should make sure that there’s a clear difference between choosing what inputs to read (which modifies internal state) and choosing what code to load (which also modifies internal state).
      
      As to the human example: my sense is that humans don’t get locked to any one set of goals; that goals continue to evolve, without much careful pruning, over a human lifetime. Expecting an AI to tinker with its goals for a while, and then stop, is asking it to do something that neither natural intelligences or existing software seems to do or even be capable of doing.
      - Vladimir_Nesov 10 Nov 2011 19:01 UTC
        0 points
        Parent
        
        Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That sort of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
        
        This seems like a plausible way of blowing up the universe, but not in the next few years. This kind of thing requires a lot of development, I’d give it 30-60 years at least.
      - Logos01 10 Nov 2011 11:29 UTC
        −2 points
        Parent
        
        Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson
        
        … I think we’re having a major breakdown of communication because to my understanding Watson does exactly what you just claimed no AI at research conferences is doing.
        
        Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define.
        
        I’m sure. But there’s a few generally sound assertions we can make:
        
        To be self-improving the machine must be able to examine its own code / be “metacognitive.”
        
        To be self-improving the machine must be able to produce a target state.
        
        From these two the notion of value fixation in such an AI would become trivial. Even if that version of the AI would have man-made value-fixation, what about the AI it itself codes? If the AI were actually smarter than us, that wouldn’t exactly be the safest route to take. Even Asimov’s Three Laws yielded a Zeroth Law.
        
        Expecting an AI to tinker with its goals for a while, and then stop,
        
        Don’t anthropomorphize. :)
        
        If you’ll recall from my description, I have no such expectation. Instead, I spoke of recursive refinement causing apparent fixation in the form of “gravity” or “stickiness” towards a specific set of values.
        
        Why is this unlike how humans normally are? Well, we don’t have much access to our own actual values.
- Nominull 9 Nov 2011 15:11 UTC
  −6 points
  Parent
  I downvoted because demands that people justify their downvoting rub me the wrong way.
  - Logos01 9 Nov 2011 15:17 UTC
    −4 points
    Parent
    I apologize, then, for my desire to become a better commenter here on LessWrong.
    - wedrifid 9 Nov 2011 16:47 UTC
      3 points
      Parent
      
      I apologize, then, for my desire to become a better commenter here on LessWrong.
      
      And I downvote apologies that are inherently insincere. :)
      - Logos01 9 Nov 2011 16:49 UTC
        1 point
        Parent
        Fair enough.