We humans don’t exhibit a lot of goal-directed behavior
Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn’t seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.
A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision making are goal-directed? (At least if “intelligent” is interpreted to mean general intelligence as opposed to narrow AI.) Do you have something in mind when you say “shallow insights”?
Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution, or uploading small chunks of human brains and prodding them, or any number of other ways I didn’t think of. In a certain sense these methods can be called “shallow”. I see no reason why all such creatures would necessarily have an urge to stabilize their values.
When you talk about AI, do you mean general intelligence, as in being competent in arbitrary domains (given enough computing power), or narrow AI, which can succeed on some classes of tasks but fail on others? I would certainly agree that narrow AI does not need to be goal-directed, and the future will surely contain many such AI. And maybe there are ways to achieve general intelligence other than through a goal-directed architecture, but since that’s already fairly simple, and all of our theories and existing examples point towards it, it just seems very unlikely that the first AGI that we build won’t goal-directed.
Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution
So far, evolution has created either narrow intelligence (non-human animals) or general intelligence that is goal-directed. Why would simulated evolution give different results?
uploading small chunks of human brains and prodding them
It seems to me that you would again end up with either a narrow intelligence or a goal-directed general intelligence.
I see no reason why all such creatures would necessarily have an urge to stabilize their values.
Again, if by AI you include narrow AI, then I’d agree with you. So what question are you asking?
BTW, an interesting related question is whether general intelligence is even possible at all, or can we only build AIs that are collections of tricks and heuristics, and we ourselves are just narrow intelligence with competence in enough areas to seem like general intelligence. Maybe that’s the question you actually have in mind?
What is the difference between what you mean by “goal-directed AGI” and “not goal-directed AGI”, given that the latter is stipulated as “competent in arbitrary domains (given enough computing power)”? What does “competent” refer to in the latter, if not to essentially goal-directedness, that is successful attainment of whatever “competence” requires by any means necessary (consequentialism, means don’t matter in themselves)? I think these are identical ideas, and rightly so.
I don’t know how to unpack “general intelligence” or “competence in arbitrary domains” and I don’t think people have any reason to believe they possess something so awesome. When people talk about AGI, I just assume they mean AI that’s at least as general as a human. A lobotomized human is one example of a “jumble of wires” that has human-level IQ but scores pretty low on goal-directedness.
The first general-enough AI we build will likely be goal-directed if it’s simple and built from first principles. But if it’s complex and cobbled together from “shallow insights”, its goal-directedness and goal-stabilization tendencies are anyone’s guess.
Wei and I took this discussion offline and came to the conclusion that “narrow AIs” without the urge to stabilize their values can also end up destroying humanity just fine. So this loose end is tidied up: contra Eliezer, a self-improving world-eating AI developed by stupid researchers using shallow insights won’t necessarily go through a value freeze. Of course that doesn’t diminish the danger and is probably just a minor point.
I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
(I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
You could probably stretch the definition of “goal” to explain such behavior
Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time.
I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn’t seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.
A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision making are goal-directed? (At least if “intelligent” is interpreted to mean general intelligence as opposed to narrow AI.) Do you have something in mind when you say “shallow insights”?
Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution, or uploading small chunks of human brains and prodding them, or any number of other ways I didn’t think of. In a certain sense these methods can be called “shallow”. I see no reason why all such creatures would necessarily have an urge to stabilize their values.
When you talk about AI, do you mean general intelligence, as in being competent in arbitrary domains (given enough computing power), or narrow AI, which can succeed on some classes of tasks but fail on others? I would certainly agree that narrow AI does not need to be goal-directed, and the future will surely contain many such AI. And maybe there are ways to achieve general intelligence other than through a goal-directed architecture, but since that’s already fairly simple, and all of our theories and existing examples point towards it, it just seems very unlikely that the first AGI that we build won’t goal-directed.
So far, evolution has created either narrow intelligence (non-human animals) or general intelligence that is goal-directed. Why would simulated evolution give different results?
It seems to me that you would again end up with either a narrow intelligence or a goal-directed general intelligence.
Again, if by AI you include narrow AI, then I’d agree with you. So what question are you asking?
BTW, an interesting related question is whether general intelligence is even possible at all, or can we only build AIs that are collections of tricks and heuristics, and we ourselves are just narrow intelligence with competence in enough areas to seem like general intelligence. Maybe that’s the question you actually have in mind?
What is the difference between what you mean by “goal-directed AGI” and “not goal-directed AGI”, given that the latter is stipulated as “competent in arbitrary domains (given enough computing power)”? What does “competent” refer to in the latter, if not to essentially goal-directedness, that is successful attainment of whatever “competence” requires by any means necessary (consequentialism, means don’t matter in themselves)? I think these are identical ideas, and rightly so.
I don’t know how to unpack “general intelligence” or “competence in arbitrary domains” and I don’t think people have any reason to believe they possess something so awesome. When people talk about AGI, I just assume they mean AI that’s at least as general as a human. A lobotomized human is one example of a “jumble of wires” that has human-level IQ but scores pretty low on goal-directedness.
The first general-enough AI we build will likely be goal-directed if it’s simple and built from first principles. But if it’s complex and cobbled together from “shallow insights”, its goal-directedness and goal-stabilization tendencies are anyone’s guess.
Wei and I took this discussion offline and came to the conclusion that “narrow AIs” without the urge to stabilize their values can also end up destroying humanity just fine. So this loose end is tidied up: contra Eliezer, a self-improving world-eating AI developed by stupid researchers using shallow insights won’t necessarily go through a value freeze. Of course that doesn’t diminish the danger and is probably just a minor point.
I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
Just being versed in nanotech could be enough. Or exotic physics. Or any number of other narrow domains.
Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
(I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
Yes, this is the most obvious one. I’m not sure if there are others. I would not have mentioned this if I had noticed your caveat.
rwallace addressed your premise here.