Hmm. I think some sympathetic reading is needed here. Steve just means to say something like: “sufficiently powerful agent—it doesn’t matter much how it is built”. Maybe if you tried to “ramp up” a genetic algorithm it would never produce a superintelligent machine—but that seems like bit of a side issue.
Steve claims his “drives” are pretty general—and you say they aren’t. The argument you give from existing humans and programs makes little sense to me, though—these are goal-directed systems, much like the ones Steve discusses.
Sure, and I’m saying his conclusion is only true for an at best very idiosyncratic definition of ‘sufficiently powerful’ - that the most powerful systems in real life are and will be those that are part of historical processes, not those that try to reinvent themselves by their bootstraps.
Humans and existing programs are approximately goal directed within limited contexts. You might have the goal of making dinner, but you aren’t willing to murder your next-door neighbor so you can fry up his liver with onions, even if your cupboard is empty. Omohundro postulates a system which, unlike any real system, throws unlimited effort and resources into a single goal without upper bound. Trying to draw conclusions about the real world from this thought experiment is like measuring the exponential increase in air velocity from someone sneezing, and concluding that in thirty seconds he’ll have blown the Earth out of orbit.
Thanks for clarifying. I think Steve is using “sufficiently powerful” to mean “sufficiently intelligent”—and quite a few definitions of intellligence are all to do with being goal-directed.
The main reason most humans don’t murder people to get what they want is because prison sentences confllict with their goals—not because they are insufficiently goal-directed, IMO. They are constrained by society’s disapproval and act within those constraints. In warfare, soociety approves, and then the other people actually do die.
Most creatures are as goal-directed as evolution can make them. It is true that there are parasites and symbiotes that mean that composite systems are sometimes optimising mulltiple goals simultaneously. Memetic parasites are quite significant for humans—but they will probably be quite significant for intelligent machines as well. Systems with parasites are not seriously inconsistent with a goal-directed model. From the perspective of such a model, parasites are part of the environment.
Machines that are goal directed until their goal is complete are another real possibility—besides open-ended optimisation. However, while their goal is incomplete, goal directed models would seem to be applicable.
quite a few definitions of intellligence are all to do with being goal-directed
Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one’s goals).
The main reason most humans don’t murder people to get what they want is because prison sentences confllict with their goals
Would you murder your next-door neighbor if you thought you could get away with it?
Most creatures are as goal-directed as evolution can make them
“As … as evolution can make them” is trivially true in that our assessment of what evolution can do is driven by what it empirically has done. It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this. Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.
Memetic parasites are quite significant for humans—but they will probably be quite significant for intelligent machines as well
Indeed, memetic parasites are quite significant for machines today.
Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one’s goals).
OK, so I am not 100% clear on the distinction you are trying to draw—but I just mean optimising, or maximising.
Would you murder your next-door neighbor if you thought you could get away with it?
Hmm—so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this. If it helps, I do think that Skinnerian conditioning—based on punishment and reprimands—is the proximate explanation for most avoidance of “bad” actions.
It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this.
So: the bees are optimised to make more bees. Stockpiling honey is part of that. Knowing why is not needed for optimisation.
Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.
OK—but even plants are optimising. There are multiple optimisation processes. One happens inside minds—that seems to be what you are talking about. Mindless things optimise too though—plants act so as to maximise the number of their offspring—and that’s still a form of optimisation.
If you want the rationale for describing such actions as being “goal directed”, we can consider the goal to be world domination by the plants, and then the actions of the plant are directed towards that goal. You can still have “direction” without a conscious “director”.
Hmm—so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this
It was a rhetorical question. I’m confident the answer is no—the law only works when most people are basically honest. We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws, and if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn’t work; regardless of what level you look at, there is no function such that humans will say “yes, this is my utility function, and I care about nothing but maximizing it.”
The idea of goals in the sense of decision theory is like the idea of particles in the sense of Newtonian physics—a useful approximation for many purposes, provided we remember that it is only an approximation and that if we get a division by zero error the fault is in our overzealous application of the theory, not in reality.
OK—but even plants are optimising. There are multiple optimisation processes
Precisely. There are many optimization processes—and none of them work the way they would need to work for Omohundro’s argument to be relevant.
Precisely. There are many optimization processes—and none of them work the way they would need to work for Omohundro’s argument to be relevant.
What do you mean exactly? Humans have the pieces for it to be relevant, but have many constraints preventing it from being applicable, such as difficulty changing our brains’ design. A mind very like humans’ that had the ability to test out new brain components and organizations seems like it would fit it.
A mind very like humans’ that had the ability to test out new brain components and organizations seems like it would fit it.
Not really, because as you say, there are many constraints preventing it from being applicable, of which difficulty changing our brains’ design is just one, so with that constraint removed, the argument would still not be applicable.
We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws
Hmm. This reminds me of my recent discussion with Matt M. about constraints.
Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties.
Identifying constraints and then rejecting optimisation-based explanations just doesn’t follow, IMHO.
if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn’t work; regardless of what level you look at, there is no function such that humans will say “yes, this is my utility function, and I care about nothing but maximizing it.”
Any agent can be expressed as an. O-maximizer (as we show in Section 3.1),
This actually only covers any computable agent.
Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness—not truth seeking.
Not just any agent, but any entity. A leaf blown on the wind can be thought of as optimizing the function of following the trajectory dictated by the laws of physics. Which is my point: if you broaden a theory to the point where it can explain anything whatsoever, then it makes no useful predictions.
A leaf blown on the wind can be thought of as optimizing the function of following the trajectory dictated by the laws of physics. Which is my point: if you broaden a theory to the point where it can explain anything whatsoever, then it makes no useful predictions.
So: in this context, optimisation is not a theory, it is a modelling tool for representing dynamical systems with.
General purposeness is a feature of such a modelling tool—not a flaw.
Optimisation is a useful tool because it abstracts what a system wants from implementation details associated with how it gets what it wants—and its own limitations. Such an abstraction helps you compare goals across agents which may have very different internal architectures.
It also helps with making predictions. Once you know that the water is optimising a function involving moving rapidy downhill, you can usefully predict that, in the future, the water will be lower down.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Omohundro is, I believe, basing his ideas on the Von Neumann Morgenstern expected utility framework—which is significantly more restrictive.
However, I think this is a red herring.
I wouldn’t trame the idea as: the theory of optimization allows predictions about the behavior of an AI in the absence of specific knowledge about its goals.
You would need to have some enumeration of the set of goal-directed systems before you can say anything useful about their properties. I propose: simplest first - so, it is more that a wide range of simple goals gives rise to a closely-related class of behaviours (Omohundro’s “drives”). These could be classed as being shared emergent properties of many goal-directed systems with simple goals.
It is more that a wide range of simple goals gives rise to a closely-related class of behaviours
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding
Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
The main issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.
Hmm. I think some sympathetic reading is needed here. Steve just means to say something like: “sufficiently powerful agent—it doesn’t matter much how it is built”. Maybe if you tried to “ramp up” a genetic algorithm it would never produce a superintelligent machine—but that seems like bit of a side issue.
Steve claims his “drives” are pretty general—and you say they aren’t. The argument you give from existing humans and programs makes little sense to me, though—these are goal-directed systems, much like the ones Steve discusses.
Sure, and I’m saying his conclusion is only true for an at best very idiosyncratic definition of ‘sufficiently powerful’ - that the most powerful systems in real life are and will be those that are part of historical processes, not those that try to reinvent themselves by their bootstraps.
Humans and existing programs are approximately goal directed within limited contexts. You might have the goal of making dinner, but you aren’t willing to murder your next-door neighbor so you can fry up his liver with onions, even if your cupboard is empty. Omohundro postulates a system which, unlike any real system, throws unlimited effort and resources into a single goal without upper bound. Trying to draw conclusions about the real world from this thought experiment is like measuring the exponential increase in air velocity from someone sneezing, and concluding that in thirty seconds he’ll have blown the Earth out of orbit.
For one thing, where are you going to get the onions?
...fava beans...
Thanks for clarifying. I think Steve is using “sufficiently powerful” to mean “sufficiently intelligent”—and quite a few definitions of intellligence are all to do with being goal-directed.
The main reason most humans don’t murder people to get what they want is because prison sentences confllict with their goals—not because they are insufficiently goal-directed, IMO. They are constrained by society’s disapproval and act within those constraints. In warfare, soociety approves, and then the other people actually do die.
Most creatures are as goal-directed as evolution can make them. It is true that there are parasites and symbiotes that mean that composite systems are sometimes optimising mulltiple goals simultaneously. Memetic parasites are quite significant for humans—but they will probably be quite significant for intelligent machines as well. Systems with parasites are not seriously inconsistent with a goal-directed model. From the perspective of such a model, parasites are part of the environment.
Machines that are goal directed until their goal is complete are another real possibility—besides open-ended optimisation. However, while their goal is incomplete, goal directed models would seem to be applicable.
Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one’s goals).
Would you murder your next-door neighbor if you thought you could get away with it?
“As … as evolution can make them” is trivially true in that our assessment of what evolution can do is driven by what it empirically has done. It remains the case that most creatures are not particularly goal-directed. We know that bees stockpile honey to survive the winter, but the bees do not know this. Even the most intelligent animals have planning horizons of minutes compared to lifespans of years to decades.
Indeed, memetic parasites are quite significant for machines today.
OK, so I am not 100% clear on the distinction you are trying to draw—but I just mean optimising, or maximising.
Hmm—so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this. If it helps, I do think that Skinnerian conditioning—based on punishment and reprimands—is the proximate explanation for most avoidance of “bad” actions.
So: the bees are optimised to make more bees. Stockpiling honey is part of that. Knowing why is not needed for optimisation.
OK—but even plants are optimising. There are multiple optimisation processes. One happens inside minds—that seems to be what you are talking about. Mindless things optimise too though—plants act so as to maximise the number of their offspring—and that’s still a form of optimisation.
If you want the rationale for describing such actions as being “goal directed”, we can consider the goal to be world domination by the plants, and then the actions of the plant are directed towards that goal. You can still have “direction” without a conscious “director”.
It was a rhetorical question. I’m confident the answer is no—the law only works when most people are basically honest. We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws, and if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn’t work; regardless of what level you look at, there is no function such that humans will say “yes, this is my utility function, and I care about nothing but maximizing it.”
The idea of goals in the sense of decision theory is like the idea of particles in the sense of Newtonian physics—a useful approximation for many purposes, provided we remember that it is only an approximation and that if we get a division by zero error the fault is in our overzealous application of the theory, not in reality.
Precisely. There are many optimization processes—and none of them work the way they would need to work for Omohundro’s argument to be relevant.
What do you mean exactly? Humans have the pieces for it to be relevant, but have many constraints preventing it from being applicable, such as difficulty changing our brains’ design. A mind very like humans’ that had the ability to test out new brain components and organizations seems like it would fit it.
Not really, because as you say, there are many constraints preventing it from being applicable, of which difficulty changing our brains’ design is just one, so with that constraint removed, the argument would still not be applicable.
Hmm. This reminds me of my recent discussion with Matt M. about constraints.
Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties.
Identifying constraints and then rejecting optimisation-based explanations just doesn’t follow, IMHO.
...and at this point, I usually just cite Dewey:
This actually only covers any computable agent.
Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness—not truth seeking.
Not just any agent, but any entity. A leaf blown on the wind can be thought of as optimizing the function of following the trajectory dictated by the laws of physics. Which is my point: if you broaden a theory to the point where it can explain anything whatsoever, then it makes no useful predictions.
So: in this context, optimisation is not a theory, it is a modelling tool for representing dynamical systems with.
General purposeness is a feature of such a modelling tool—not a flaw.
Optimisation is a useful tool because it abstracts what a system wants from implementation details associated with how it gets what it wants—and its own limitations. Such an abstraction helps you compare goals across agents which may have very different internal architectures.
It also helps with making predictions. Once you know that the water is optimising a function involving moving rapidy downhill, you can usefully predict that, in the future, the water will be lower down.
Suppose we grant all this. Very well, then consider what conclusions we can draw from it about the behavior of the hypothetical AI originally under discussion. Clearly no matter what sequence of actions the AI were to carry out, we would be able to explain it with this theory. But a theory that can explain any observations whatsoever, makes no predictions. Therefore, contrary to Omohundro, the theory of optimization does not make any predictions about the behavior of an AI in the absence of specific knowledge of the goals thereof.
Omohundro is, I believe, basing his ideas on the Von Neumann Morgenstern expected utility framework—which is significantly more restrictive.
However, I think this is a red herring.
I wouldn’t trame the idea as: the theory of optimization allows predictions about the behavior of an AI in the absence of specific knowledge about its goals.
You would need to have some enumeration of the set of goal-directed systems before you can say anything useful about their properties. I propose: simplest first - so, it is more that a wide range of simple goals gives rise to a closely-related class of behaviours (Omohundro’s “drives”). These could be classed as being shared emergent properties of many goal-directed systems with simple goals.
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Fancy giving your 2p on universal instrumental values and Goal System Zero...?
I contend that these are much the same idea wearing different outfits. Do you object to them too?
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Hmm. Maybe I should give some examples—to make things more concrete.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.