Wei Dai comments on Why would an AI try to figure out its goals?

Wei Dai 9 Nov 2011 23:19 UTC
4 points

We humans don’t exhibit a lot of goal-directed behavior

Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn’t seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.

A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision making are goal-directed? (At least if “intelligent” is interpreted to mean general intelligence as opposed to narrow AI.) Do you have something in mind when you say “shallow insights”?
- cousin_it 15 Nov 2011 0:43 UTC
  2 points
  Parent
  Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution, or uploading small chunks of human brains and prodding them, or any number of other ways I didn’t think of. In a certain sense these methods can be called “shallow”. I see no reason why all such creatures would necessarily have an urge to stabilize their values.
  - Wei Dai 15 Nov 2011 21:24 UTC
    5 points
    Parent
    When you talk about AI, do you mean general intelligence, as in being competent in arbitrary domains (given enough computing power), or narrow AI, which can succeed on some classes of tasks but fail on others? I would certainly agree that narrow AI does not need to be goal-directed, and the future will surely contain many such AI. And maybe there are ways to achieve general intelligence other than through a goal-directed architecture, but since that’s already fairly simple, and all of our theories and existing examples point towards it, it just seems very unlikely that the first AGI that we build won’t goal-directed.
    
    Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution
    
    So far, evolution has created either narrow intelligence (non-human animals) or general intelligence that is goal-directed. Why would simulated evolution give different results?
    
    uploading small chunks of human brains and prodding them
    
    It seems to me that you would again end up with either a narrow intelligence or a goal-directed general intelligence.
    
    I see no reason why all such creatures would necessarily have an urge to stabilize their values.
    
    Again, if by AI you include narrow AI, then I’d agree with you. So what question are you asking?
    
    BTW, an interesting related question is whether general intelligence is even possible at all, or can we only build AIs that are collections of tricks and heuristics, and we ourselves are just narrow intelligence with competence in enough areas to seem like general intelligence. Maybe that’s the question you actually have in mind?
    - Vladimir_Nesov 16 Nov 2011 22:24 UTC
      2 points
      Parent
      What is the difference between what you mean by “goal-directed AGI” and “not goal-directed AGI”, given that the latter is stipulated as “competent in arbitrary domains (given enough computing power)”? What does “competent” refer to in the latter, if not to essentially goal-directedness, that is successful attainment of whatever “competence” requires by any means necessary (consequentialism, means don’t matter in themselves)? I think these are identical ideas, and rightly so.
    - cousin_it 16 Nov 2011 0:34 UTC
      1 point
      Parent
      I don’t know how to unpack “general intelligence” or “competence in arbitrary domains” and I don’t think people have any reason to believe they possess something so awesome. When people talk about AGI, I just assume they mean AI that’s at least as general as a human. A lobotomized human is one example of a “jumble of wires” that has human-level IQ but scores pretty low on goal-directedness.
      
      The first general-enough AI we build will likely be goal-directed if it’s simple and built from first principles. But if it’s complex and cobbled together from “shallow insights”, its goal-directedness and goal-stabilization tendencies are anyone’s guess.
      - cousin_it 16 Nov 2011 1:45 UTC
        3 points
        Parent
        Wei and I took this discussion offline and came to the conclusion that “narrow AIs” without the urge to stabilize their values can also end up destroying humanity just fine. So this loose end is tidied up: contra Eliezer, a self-improving world-eating AI developed by stupid researchers using shallow insights won’t necessarily go through a value freeze. Of course that doesn’t diminish the danger and is probably just a minor point.
        Vladimir_Nesov 16 Nov 2011 22:29 UTC
        0 points
        Parent
        I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
        cousin_it 16 Nov 2011 22:49 UTC
        0 points
        Parent
        Just being versed in nanotech could be enough. Or exotic physics. Or any number of other narrow domains.
        Vladimir_Nesov 16 Nov 2011 22:56 UTC
        0 points
        Parent
        Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
        
        (I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
        cousin_it 17 Nov 2011 0:35 UTC
        3 points
        Parent
        That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
        Vladimir_Nesov 17 Nov 2011 0:53 UTC
        0 points
        Parent
        
        You could probably stretch the definition of “goal” to explain such behavior
        
        Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
        
        Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
        Vladimir_Nesov 17 Nov 2011 0:47 UTC
        0 points
        Parent
        This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
        
        I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
        wedrifid 17 Nov 2011 3:33 UTC
        0 points
        Parent
        
        I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time.
        
        I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
        Expand this thread
        Vladimir_Nesov 17 Nov 2011 3:43 UTC
        0 points
        Parent
        If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
        wedrifid 17 Nov 2011 3:48 UTC
        2 points
        Parent
        
        If you are not talking about temporal discounting (which I mentioned)
        
        Yes, this is the most obvious one. I’m not sure if there are others. I would not have mentioned this if I had noticed your caveat.
- torekp 14 Nov 2011 14:37 UTC
  1 point
  Parent
  rwallace addressed your premise here.