Why would an AI try to figure out its goals?
“So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit.”—Stephen M. Omohundro, The Basic AI Drives
This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using “shallow insights” without an explicit goal-directed architecture—some program that “just happens” to make intelligent decisions that can be viewed by us as fulfilling certain goals—then it has no particular reason to stabilize its goals. Isn’t that anthropomorphizing? We humans don’t exhibit a lot of goal-directed behavior, but we do have a verbal concept of “goals”, so the verbal phantom of “figuring out our true goals” sounds meaningful to us. But why would AIs behave the same way if they don’t think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I’m missing?
The quotes are correct in the sense that “P implies P” is correct; that is, the authors postulate the existence of an entity constructed in a certain way so as to have certain properties, then argue that it would indeed have those properties. True, but not necessarily consequential, as there is no compelling reason to believe in the future existence of an entity constructed in that way in the first place. Most humans aren’t like that, after all, and neither are existing or in-development AI programs; nor is it a matter of lacking “intelligence” considered as a scalar quantity, as there is no tendency for the more capable AI programs to be constructed more along the postulated lines (if anything, arguably the reverse).
Yes, that’s what bothered me about the paper all along. I actually think that the sort of AI they are talking about might require a lot of conjunctive, not disjunctive, lines of reasoning and that the the subset of all AGI designs possible that does not FOOM might be much larger than it is often being portrayed around here.
Actually, Omohundro claims that the “drives” he proposes are pretty general—in the cited paper—here:
Sure, and my point is that when you look more closely, ‘sufficiently powerful’ translates to ‘actually pretty much nothing people have built or tried to build within any of these architectures would have this property, no matter how much power you put behind it; instead you would have to build a completely different system with very particular properties, that wouldn’t really use the aforementioned architectures as anything except unusually inefficient virtual machines, and wouldn’t perform well in realistic conditions.’
Hmm. I think some sympathetic reading is needed here. Steve just means to say something like: “sufficiently powerful agent—it doesn’t matter much how it is built”. Maybe if you tried to “ramp up” a genetic algorithm it would never produce a superintelligent machine—but that seems like bit of a side issue.
Steve claims his “drives” are pretty general—and you say they aren’t. The argument you give from existing humans and programs makes little sense to me, though—these are goal-directed systems, much like the ones Steve discusses.
Sure, and I’m saying his conclusion is only true for an at best very idiosyncratic definition of ‘sufficiently powerful’ - that the most powerful systems in real life are and will be those that are part of historical processes, not those that try to reinvent themselves by their bootstraps.
Humans and existing programs are approximately goal directed within limited contexts. You might have the goal of making dinner, but you aren’t willing to murder your next-door neighbor so you can fry up his liver with onions, even if your cupboard is empty. Omohundro postulates a system which, unlike any real system, throws unlimited effort and resources into a single goal without upper bound. Trying to draw conclusions about the real world from this thought experiment is like measuring the exponential increase in air velocity from someone sneezing, and concluding that in thirty seconds he’ll have blown the Earth out of orbit.
For one thing, where are you going to get the onions?
...fava beans...
Thanks for clarifying. I think Steve is using “sufficiently powerful” to mean “sufficiently intelligent”—and quite a few definitions of intellligence are all to do with being goal-directed.
The main reason most humans don’t murder people to get what they want is because prison sentences confllict with their goals—not because they are insufficiently goal-directed, IMO. They are constrained by society’s disapproval and act within those constraints. In warfare, soociety approves, and then the other people actually do die.
Most creatures are as goal-directed as evolution can make them. It is true that there are parasites and symbiotes that mean that composite systems are sometimes optimising mulltiple goals simultaneously. Memetic parasites are quite significant for humans—but they will probably be quite significant for intelligent machines as well. Systems with parasites are not seriously inconsistent with a goal-directed model. From the perspective of such a model, parasites are part of the environment.
Machines that are goal directed until their goal is complete are another real possibility—besides open-ended optimisation. However, while their goal is incomplete, goal directed models would seem to be applicable.
Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one’s goals).
Would you murder your next-door neighbor if you thought you could get away with it?
“As … as evolution can make them” is trivially true in that our assessment of what evolution can do is driven by what it empirically has done. It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this. Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.
Indeed, memetic parasites are quite significant for machines today.
OK, so I am not 100% clear on the distinction you are trying to draw—but I just mean optimising, or maximising.
Hmm—so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this. If it helps, I do think that Skinnerian conditioning—based on punishment and reprimands—is the proximate explanation for most avoidance of “bad” actions.
So: the bees are optimised to make more bees. Stockpiling honey is part of that. Knowing why is not needed for optimisation.
OK—but even plants are optimising. There are multiple optimisation processes. One happens inside minds—that seems to be what you are talking about. Mindless things optimise too though—plants act so as to maximise the number of their offspring—and that’s still a form of optimisation.
If you want the rationale for describing such actions as being “goal directed”, we can consider the goal to be world domination by the plants, and then the actions of the plant are directed towards that goal. You can still have “direction” without a conscious “director”.
It was a rhetorical question. I’m confident the answer is no—the law only works when most people are basically honest. We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws, and if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn’t work; regardless of what level you look at, there is no function such that humans will say “yes, this is my utility function, and I care about nothing but maximizing it.”
The idea of goals in the sense of decision theory is like the idea of particles in the sense of Newtonian physics—a useful approximation for many purposes, provided we remember that it is only an approximation and that if we get a division by zero error the fault is in our overzealous application of the theory, not in reality.
Precisely. There are many optimization processes—and none of them work the way they would need to work for Omohundro’s argument to be relevant.
What do you mean exactly? Humans have the pieces for it to be relevant, but have many constraints preventing it from being applicable, such as difficulty changing our brains’ design. A mind very like humans’ that had the ability to test out new brain components and organizations seems like it would fit it.
Not really, because as you say, there are many constraints preventing it from being applicable, of which difficulty changing our brains’ design is just one, so with that constraint removed, the argument would still not be applicable.
Hmm. This reminds me of my recent discussion with Matt M. about constraints.
Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties.
Identifying constraints and then rejecting optimisation-based explanations just doesn’t follow, IMHO.
...and at this point, I usually just cite Dewey:
This actually only covers any computable agent.
Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness—not truth seeking.
Not just any agent, but any entity. A leaf blown on the wind can be thought of as optimizing the function of following the trajectory dictated by the laws of physics. Which is my point: if you broaden a theory to the point where it can explain anything whatsoever, then it makes no useful predictions.
So: in this context, optimisation is not a theory, it is a modelling tool for representing dynamical systems with.
General purposeness is a feature of such a modelling tool—not a flaw.
Optimisation is a useful tool because it abstracts what a system wants from implementation details associated with how it gets what it wants—and its own limitations. Such an abstraction helps you compare goals across agents which may have very different internal architectures.
It also helps with making predictions. Once you know that the water is optimising a function involving moving rapidy downhill, you can usefully predict that, in the future, the water will be lower down.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Omohundro is, I believe, basing his ideas on the Von Neumann Morgenstern expected utility framework—which is significantly more restrictive.
However, I think this is a red herring.
I wouldn’t trame the idea as: the theory of optimization allows predictions about the behavior of an AI in the absence of specific knowledge about its goals.
You would need to have some enumeration of the set of goal-directed systems before you can say anything useful about their properties. I propose: simplest first - so, it is more that a wide range of simple goals gives rise to a closely-related class of behaviours (Omohundro’s “drives”). These could be classed as being shared emergent properties of many goal-directed systems with simple goals.
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Fancy giving your 2p on universal instrumental values and Goal System Zero...?
I contend that these are much the same idea wearing different outfits. Do you object to them too?
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Hmm. Maybe I should give some examples—to make things more concrete.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.
Saying that there is an agent refers (in my view; definition for this thread) to a situation where future events are in some sense expected to be optimized according to some goals, to the extent certain other events (“actions”) control those future events. There might be many sufficient conditions for that in terms of particular AI designs, but they should amount to this expectation.
So an agent is already associated with goals in terms of its actual effect on its environment. Given that agent’s own future state (design) is an easily controlled part of the environment, it’s one of the things that’ll be optimized, and given that agents are particularly powerful incantations, it’s a good bet that future will retain agent-y patterns, at least for a start. If future agent has goals different from the original, this by the same definition says that the future will be optimized for different goals, and yet in a way controllable by original agent’s actions (through the future agent). This contradicts that the original agent is an agent (with original goals). And since the task of constructing future agent includes specification of goals, original agent needs to figure out what they are.
There seems to be a leap, here. An agent, qua agent, has goals. But is it clear that the historical way in which the future-agent is constructed by the original agent must pass through an explicit specification of the future-agent’s goals? The future-agent could be constructed that way, but must it? (Analogously, a composite integer has factors, but a composite can be constructed without explicitly specifying its factors.)
Goals don’t need to be specified explicitly, all that’s required is that it’s true that future agent has goals similar to original agent’s. However, since construction of future agent is part of original agent’s behavior that contributes to original agent’s goals (by my definition), it doesn’t necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.
For example, original agent might be bad at accomplishing its “normative” goals, and even though it’s true that it optimizes the environment to some extent, it doesn’t do it very well, so definition of “normative” goals (related in my definition to actual effect on environment) doesn’t clearly derive from original agent’s construction, except specifically for its tendency to construct future agents with certain goals (assuming it can do that true to the “normative” goals), in which case future agent’s goals (as parameters of design) are closer to the mark (actual effect on environment and “normative” goals) than original agent’s (as parameters of design).
(Emphasis added.) For that sense of “specify”, I agree.
Specification of goals doesn’t need to be explicit, the only thing that’s necessary is showing that new goals are sufficiently close to original goals. Since “goals” are already a rather abstract property of the way an agent behaves, saying the they are “specified” is close to just saying that the agent (in environment) is specified, and showing in what way that relates to the implicit goals.
If you added general intelligence and consciousness to IBM Watson, where does the urge to refine or protect its Jeopardy skills come from? Why would it care if you pulled the plug on it? I just don’t see how optimization and goal protection are inherent features of general intelligence, agency or even consciousness.
He seems to be arguing around the definition of an agent using BDI or similar logic; BDI stands for beliefs-desires-intentions, and the intentions are goals. In this framework (more accurately, set of frameworks) agents necessarily, by definition have goals. More generally, though, I have difficulty envisioning anything that could realistically be called an “agent” that does not have goals. Without goals you would have a totally reactive intelligence, but it could not do anything without being specifically instructed, like a modern computer.
ADDED: Thinking further, such a “goal-less” intelligence couldn’t even try to foresee questions in order to have answers ready, or take any independent action. You seem to be arguing for an un-intelligent, in any real meaning of the word, intelligence.
Consider someone with a larger inferential distance, e.g. a potential donor. The title “The Basic AI Drives” seems to be a misnomer, given the amount of presuppositions inherent in your definition. There exist a vast amount of possible AI designs, that would appear to be agents, that would have no incentive to refine or even protect their goals.
Omohundro’s paper says:
It’s not obvious to me why any of these systems would be “agents” under your definition. So I guess your definition is too strong. My question stands.
The “sufficiently powerful” clause seems to me like something that should translate as roughly my definition, making implementation method irrelevant for essentially the same reasons. In context, “powerful” means “powerful as a consequentialist agent”, and that’s just what I unpacked (a little bit) in my definition.
(It’s unknown how large the valley is between a hacked together AI that can’t get off the ground and a hacked together AI that is at least as reflective as, say, Vladimir Nesov. Presumably Vladimir Nesov would be very wary of locking himself into a decision algorithm that was as unreflective as many synax-manipulator/narrow-AI-like imagined AGIs that get talked about by default around here/SingInst.)
Let’s start with the template for an AGI, the seed for a generally intelligent expected-utility maximizer capable of recursive self-improvement.
As far as I can tell, the implementation of such a template would do nothing at all because its utility-function would be a “blank slate”.
What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
Would complex but implicit goals change its behavior? Why would it improve upon its goals, why would it even try to preserve them in their current form if it has no explicit incentive to do so? It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.
So: the general story is that to be able to optimise, agents have to build a model of the world—in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent—since it is an important part of its own local environment. That model of itself is likely to include its own goals—and it will use Occam’s razor to build a neat model of them. Thus goal reflection—Q.E.D.
That is the general idea of universal instrumental values, yes.
I am aware of that argument but don’t perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
I’m not really sure what you mean “by default”. The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn’t become a victim of the wirehead or pornography problems.
IMO, there’s a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking—the second you have to deliberately build in. IMO, it doesn’t make much sense to lump these ideas together and reject both of them on the same grounds—as you seem to be doing.
Do you not count reward-seeking / reinforcement-learning / AIXI-like behavior as goal-directed behavior? If not, why not? If yes, it doesn’t seem possible to build an AI that makes intelligent decisions without a goal-directed architecture.
A superintelligence might be able to create a jumble of wires that happen to do intelligent things, but how are we humans supposed to stumble onto something like that, given that all existing examples of intelligent behavior and theories about intelligent decision making are goal-directed? (At least if “intelligent” is interpreted to mean general intelligence as opposed to narrow AI.) Do you have something in mind when you say “shallow insights”?
Given enough computing power, humans can create a haphazardly smart jumble of wires by simulated evolution, or uploading small chunks of human brains and prodding them, or any number of other ways I didn’t think of. In a certain sense these methods can be called “shallow”. I see no reason why all such creatures would necessarily have an urge to stabilize their values.
When you talk about AI, do you mean general intelligence, as in being competent in arbitrary domains (given enough computing power), or narrow AI, which can succeed on some classes of tasks but fail on others? I would certainly agree that narrow AI does not need to be goal-directed, and the future will surely contain many such AI. And maybe there are ways to achieve general intelligence other than through a goal-directed architecture, but since that’s already fairly simple, and all of our theories and existing examples point towards it, it just seems very unlikely that the first AGI that we build won’t goal-directed.
So far, evolution has created either narrow intelligence (non-human animals) or general intelligence that is goal-directed. Why would simulated evolution give different results?
It seems to me that you would again end up with either a narrow intelligence or a goal-directed general intelligence.
Again, if by AI you include narrow AI, then I’d agree with you. So what question are you asking?
BTW, an interesting related question is whether general intelligence is even possible at all, or can we only build AIs that are collections of tricks and heuristics, and we ourselves are just narrow intelligence with competence in enough areas to seem like general intelligence. Maybe that’s the question you actually have in mind?
What is the difference between what you mean by “goal-directed AGI” and “not goal-directed AGI”, given that the latter is stipulated as “competent in arbitrary domains (given enough computing power)”? What does “competent” refer to in the latter, if not to essentially goal-directedness, that is successful attainment of whatever “competence” requires by any means necessary (consequentialism, means don’t matter in themselves)? I think these are identical ideas, and rightly so.
I don’t know how to unpack “general intelligence” or “competence in arbitrary domains” and I don’t think people have any reason to believe they possess something so awesome. When people talk about AGI, I just assume they mean AI that’s at least as general as a human. A lobotomized human is one example of a “jumble of wires” that has human-level IQ but scores pretty low on goal-directedness.
The first general-enough AI we build will likely be goal-directed if it’s simple and built from first principles. But if it’s complex and cobbled together from “shallow insights”, its goal-directedness and goal-stabilization tendencies are anyone’s guess.
Wei and I took this discussion offline and came to the conclusion that “narrow AIs” without the urge to stabilize their values can also end up destroying humanity just fine. So this loose end is tidied up: contra Eliezer, a self-improving world-eating AI developed by stupid researchers using shallow insights won’t necessarily go through a value freeze. Of course that doesn’t diminish the danger and is probably just a minor point.
I’d expect a “narrow AI” that’s capable enough to destroy humanity to be versed in enough domains to qualify as goal-directed (according to a notion of having a goal that refers to a tendency to do something consequentialistic in a wide variety of domains, which seems to be essentially the same thing as “being competent”, since you’d need a notion of “competence” for that, and notions of “competence” seem to refer to successful goal-achievement given some goals).
Just being versed in nanotech could be enough. Or exotic physics. Or any number of other narrow domains.
Could be, but not particularly plausible if would still naturally qualify as “AI-caused catastrophe”, rather than primarily a nanotech/physics experiment/tools going wrong with a bit of AI facilitating the catastrophe.
(I’m interested in what you think about the AGI competence=goals thesis. To me this seems to dissolve the question and I’m curious if I’m missing the point.)
That doesn’t sound right. What if I save people on Mondays and kill people on Tuesdays, being very competent at both? You could probably stretch the definition of “goal” to explain such behavior, but it seems easier to say that competence is just competence.
Characterize, not explain. This defines (idealized) goals given behavior, it doesn’t explain behavior. The (detailed) behavior (together with the goals) is perhaps explained by evolution or designer’s intent (or error), but however evolution (design) happened is a distinct question from what is agent’s own goal.
Saying that something is goal-directed seems to be an average fuzzy category, like “heavy things”. Associated with it are “quantitative” ideas of a particular goal, and optimality of its achievement (like with particular weight).
This could be a goal, maximization of (Monday-saved + Tuesday-killed). If resting and preparation the previous day helps, you might opt for specializing in Tuesday-killing, but Monday-save someone if that happens to be convenient and so on…
I think this only sounds strange because humans don’t have any temporal terminal values, and so there is an implicit moral axiom of invariance in time. It’s plausible we could’ve evolved something associated with time of day, for example. (It’s possible we actually do have time-dependent values associated with temporal discounting.)
I don’t believe this is the case. I need to use temporal terminal values to model the preferences that I seem to have.
If you are not talking about temporal discounting (which I mentioned), as your comment stands I can only see that there is disagreement, but don’t understand why. (Values I can think of whose expression is plausibly time-dependent seem to be better explained in terms of context.)
Yes, this is the most obvious one. I’m not sure if there are others. I would not have mentioned this if I had noticed your caveat.
rwallace addressed your premise here.
I think that that’s where you’re looking at it differently from Eliezer et al. I think that Eliezer at least is talking about an AI which has goals, but does not, when it starts modifying itself, understand itself well enough to keep them stable. Once it gets good enough at self modification to keep its goals stable, it will do so, and they will be frozen indefinitely.
(This is just a placeholder explanation. I hope that someone clever and wise will come in and write a better one.)
I’m finding it hard to imagine an agent that can get a diversity of difficult things done in a complex environment without forming goals and subgoals, which sounds to me like a requirement of general intelligence. AGI seems to require many-step plans and planning seems to require goals.
Personally I try to see general intelligence purely as a potential. Why would any artificial agent tap its full potential, where does the incentive come from?
If you deprived a human infant of all its evolutionary drives (e.g. to avoid pain, seek nutrition, status and sex), would it just grow into an adult that tried to become rich or rule a country? No, it would have no incentive to do so. Even though such a “blank slate” would have the same potential for general intelligence, it wouldn’t use it.
Say you came up with the most basic template for general intelligence that works given limited resources. If you wanted to apply this potential to improve your template, you would have to give it the explicit incentive to do so. But would it take over the world in doing so? Not if you didn’t explicitly told it to do so, why would it?
In what sense would it be wrong for a general intelligence to maximize paperclips in the universe by waiting for them to arise due to random fluctuations out of a state of chaos? It is not inherently stupid to desire that, there is no law of nature that prohibits certain goals.
The crux of the matter is that a goal isn’t enough to enable the full potential of general intelligence, you also need to explicitly define how to achieve that goal. General intelligence does not imply recursive self-improvement, just the potential to do so, not the incentive. The incentive has to be explicitly defined.
The Omohundro quote sounds like what humans do. If humans do it, machines might well do it too.
The Yudkowsky quote seems more speculative. It assumes that values are universal, and don’t need to adapt to local circumstances. This would be in contrast to what has happened in evolution so far—where there are many creatures with different niches and the organisms (and their values) adapt to the niches.
Yeah, that’s why I called it “anthropomorphizing” in the post. It’s always been a strikingly unsuccessful way to make predictions about computers.
I pretty-much agree with the spirit of the Omohundro quote. It usually helps you meet your goals if you know what they are. That’s unlikely to be a feature specific to humans, and it is likely to apply to goal-directed agents above a certain threshold. (too-simple agents may not get much out of it). Of course, agents might start out with a clear representation of their goals—but if they don’t, they are likely to want one, as a basic component of the task of modelling themselves.
I understood Omohundro’s Basic AI Drives as applying only to successful (although not necessarily Friendly) GAI. If a recursively self-improving GAI had massive value drift with each iterative improvement to its ability at reaching its values, it’d end up just flailing around, doing a stochastic series of actions with superhuman efficiency.
I think the Eliezer quote is predicated on the same sort of idea—that you’ve designed the AI to attempt to preserve its values; you just did it imperfectly. Assuming the value of value preservation isn’t among the ones that become altered in various self-rewrites, at some point it’ll become good enough at value preservation to keep whatever it has. But at that point, it’ll be too late to preserve the original.
Part of the problem, it appears to me, is that you’re ascribing a verbal understanding to a mechanical process. Consider; for AIs to have values those values must be ‘stored’ in a medium compatible with their calculations.
However, once an AI begins to ‘improve’ itself—that is, once an AI has as an available “goal” the ability to form better goals—then it’s going to base the decisions of what an improved goal is based on the goals and values it already has. This will cause it to ‘stabilize’ upon a specific set of higher-order values / goals.
Once the AI “decides” that becoming a better paperclip maker is something it values, it is going to value valuing making itself a better paperclip optimizer recursively in a positive feedback loop that will then anchor upon a specific position.
This can, quite easily, be expressed in mathematical / computational terms—though I am insufficient to the task of doing so.
A different way of viewing it is that once intentionality is introduced to assigning value, assigning value has an assigned value. Recursion of goal-orientation can then be viewed to produce a ‘gravity’ in then-existing values.
EDIT: To those of you downvoting, would you care to explain what you disagree with that is causing you to do so?
I haven’t downvoted you, but I suspect that the downvotes are arising from two remarks:
This sentence seems off. It isn’t clear what is meant by mechanical in this context other than to shove through a host of implied connotations.
Also:
I could see this sentence as being a cause for downvotes. Asserting that something non-trivial can be put in terms of math when one can’t do so on one’s own and doesn’t provide a reference seems less than conducive to good discussion.
Hrm. If I had used the word “procedural” rather than “mechanical”, would that have, do you think, prevented this impression?
If I am not a physicist, does that disqualify me from making claims about what a physicist would be relatively easily able to do? For example; “I’m not sufficient to the task of calculating my current relativistic mass—but anyone who works with general relativity would have no trouble doing this.”
So what am I missing with this element? Because I genuinely cannot see a difference between “a mathematician / AI worker could express in mathematical or computational terms the nature of recursive selection pressure” and “a general relativity physicist could calculate my relativistic mass relative to the Earth” in terms of the exceptionalism of either claim.
Is it perhaps that my wording appears to be implying that I meant more than “goals can be arranged in a graph of interdependent nodes that recursively update one another for weighting”?
Part of the reason why the sentence bothers me is that I’m a mathematician and it wasn’t obvious to me that there is a useful way of making the statement mathematically precise.
So this is a little better and that may be part of it. Unfortunately, it isn’t completely obvious that this is true either. This is a property that we want goal systems to have in some form. It isn’t obvious that all goal systems in some broad sense will necessarily do so.
“All” goal systems don’t have to; only some. The words I use to form this sentence do not comprise the whole of the available words of the English language—just the ones that are “interesting” to this sentence.
It would seem implicit that any computationally-based artificial intelligence would have a framework for computing. If that AI has volition, then it has goals. As we’re already discussing, topically, a recursively improving AI, then it has volition; direction. So we see that it by definition has to have computable goals.
Now, for my statement to be true—the original one that was causing the problems, that is—it’s only necessary that this be expressible in “mathematical / computational terms”. Those terms need not be practically useful—in much the same way that a “proof of concept” is not the same thing as a “finished product”.
Additionally, I somewhat have trouble grappling with the rejection of that original statement given the fact that values can be defined about “beliefs about what should be”—and we already express beliefs in Bayesian terms as a matter of course on this site.
What I mean here is, given the new goal of finding better ways for me to communicate to LWers—what’s the difference here? Why is it not okay for me to make statements that rest on commonly accepted ‘truths’ of LessWrong?
Is it the admission of my own incompetence to derive that information “from scratch”? Is it my admission to a non-mathematically-rigorous understanding of what is mathematically expressible?
(If it is that lattermore, then I find myself leaning towards the conclusion that the problem isn’t with me, but with the people who downvote me for it.)
I would downvote a comment that confidently asserted a claim of which I am dubious, when the author has no particular evidence for it, and admits to having no evidence for it.
This applies even if many people share the belief being asserted. I can’t downvote a common unsupported belief, but I can downvote the unsupported expression of it.
Every pocess is a mechanical one.
Reductively, yes. But this is like saying “every biological process is a physical process”. While trivially true, it is not very informative. Especially when attempting to relate to someone that much of their problem in understanding a specific situation is that they are “viewing it from the wrong angle”.
I am skeptical of this claim. I’m not at all convinced that it’s feasible to formalize “goal” or that if we could formalize it, the claim would be true in general. Software is awfully general, and I can easily imagine a system that has some sort of constraint on its self-modification, where that constraint can’t be self-modified away. I can also imagine a system that doesn’t have an explicit constraint on its evolution but that isn’t an ideal self-modifier. Humans, for instance, have goals and a limited capacity to self-modify, but we don’t usually see them become totally dedicated to any one goal.
Would you agree that Bayesian Belief Nets can be described/expressed in the form of a graph of nodal points? Can you describe an intelligible reason why values should not be treated as “ought” beliefs (that is, beliefs about what should be)?
Furthermore; why does it need to be general? We’re discussing a specific category of AI. Are you aware of any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?
That’s not an accurate description of the scenario being discussed. We’re not discussing fixation upon a single value/goal but the fixation of specific SETS of goals.
I can think of several good reasons why values might not be incorporated into a system as “ought” beliefs. If my AI isn’t very good at reasoning, I might, for instance, find it simpler to construct a black-box “does this action have consequence X” property-checker and incorporate that into the system somewhere. The rest of the system has no access to the internals of the black box—it just supplies a proposed course of action and gets back a YES or a NO.
You ask whether there’s “any AI research ongoing that would support the notion that AIs wouldn’t have some sort of systematic categorization of beliefs and values?”
Most of what’s currently published at major AI research conferences describes systems that don’t have any such systematic characterization. Suppose we built a super-duper Watson that passed the Turing test and had some limited capacity to improve itself by, e.g., going out and fetching new information from the Internet. That soft of system strikes me as the likeliest one to meet the bar of “AGI” in the next few years. It isn’t particularly far from current research.
Before you quibble about whether that’s the kind of system we’re talking about—I haven’t seen a good definition of “self-improving” program, and I suspect it is not at all straightforward to define. Among other reasons, I don’t know a good definition that separates ‘code’ and ‘data’. So if you don’t like the example above, you should make sure that there’s a clear difference between choosing what inputs to read (which modifies internal state) and choosing what code to load (which also modifies internal state).
As to the human example: my sense is that humans don’t get locked to any one set of goals; that goals continue to evolve, without much careful pruning, over a human lifetime. Expecting an AI to tinker with its goals for a while, and then stop, is asking it to do something that neither natural intelligences or existing software seems to do or even be capable of doing.
This seems like a plausible way of blowing up the universe, but not in the next few years. This kind of thing requires a lot of development, I’d give it 30-60 years at least.
… I think we’re having a major breakdown of communication because to my understanding Watson does exactly what you just claimed no AI at research conferences is doing.
I’m sure. But there’s a few generally sound assertions we can make:
To be self-improving the machine must be able to examine its own code / be “metacognitive.”
To be self-improving the machine must be able to produce a target state.
From these two the notion of value fixation in such an AI would become trivial. Even if that version of the AI would have man-made value-fixation, what about the AI it itself codes? If the AI were actually smarter than us, that wouldn’t exactly be the safest route to take. Even Asimov’s Three Laws yielded a Zeroth Law.
Don’t anthropomorphize. :)
If you’ll recall from my description, I have no such expectation. Instead, I spoke of recursive refinement causing apparent fixation in the form of “gravity” or “stickiness” towards a specific set of values.
Why is this unlike how humans normally are? Well, we don’t have much access to our own actual values.
I downvoted because demands that people justify their downvoting rub me the wrong way.
I apologize, then, for my desire to become a better commenter here on LessWrong.
And I downvote apologies that are inherently insincere. :)
Fair enough.
If the AI is an optimization process, it will try to find out what it’s optimizing explicitly. If not, it’s not intelligent.
This seems like a tortured definition. There are many humans who haven’t thought seriously about their goals in life. Some of these humans are reasonably bright people. It would be a very counterintuitive definition of “intelligent” that said that a machine that thought and communicated as well as a very-well-read and high-IQ 13-year-old wasn’t intelligence.
They’re not intelligent on a large scale. That is, knowing how they act on the short term would be a more effective way to find out how they end up than knowing what their goals are. They still may have short-term intelligence, in that knowing how they act on the very short term would be less useful to predicting what happens than what they want on the short term. For example, you’d be much better off predicting the final state of a chess game against someone who’s much better than you at chess by how they want it to end than by what their moves are.
It’s not impossible to accomplish your goals without knowing what they are, but it helps. If you’re particularly intelligent in regards to some goal, you’d probably do it. Nobody knows their goals exactly, but they generally know it enough to be helpful.
Very few possible conversations sound remotely like a very-well-read high-IQ 13-year-old. Anything that can do it is either a very good optimization process, or just very complicated (such as a recording of such a conversation).
Edit:
I suppose I should probably justify why this definition of intelligence is useful here. When we try to build an AI, what we want is something that will accomplish our goals. It will make sure that, of all possible universes, the one that happens is particularly high on our utility ranking. In short, it must be intelligent with respect to our goals by the definition of intelligence I just gave. If it can hold a conversation like a 13-year-old, but can’t do anything useful, we don’t want it.