It is more that a wide range of simple goals gives rise to a closely-related class of behaviours
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding
Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
The main issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.
But that is only true by a definition of ‘simple goals’ under which humans and other entities that actually exist do not have simple goals. You can have a theory that explains the behavior that occurs in the real world, or you can have a theory that admits Omohundro’s argument, but they are different theories and you can’t use both in the same argument.
Fancy giving your 2p on universal instrumental values and Goal System Zero...?
I contend that these are much the same idea wearing different outfits. Do you object to them too?
Well yes. You give this list of things you claim are universal instrumental values, and it sounds like a plausible idea in our heads, but when we look at the real world, we find humans and other agents tend not in fact possess these, even as instrumental values.
Hmm. Maybe I should give some examples—to make things more concrete.
Omohundro bases his argument on a chess playing computer—which does have a pretty simple goal. The first lines of the paper read:
I did talk about simple goals—but the real idea (which I also mentioned) was an enumeration of goal-directed systems in order of simplicity. Essentially, unless you have something like an enumeration on an infinite set you can’t say much about the properies of its members. For example, “half the integers are even” is a statement, the truth of which depends critically on how the integers are enumerated. So, I didn’t literally mean that the idea didn’t also apply to systems with complex values. “Simplicity” was my idea of shorthand for the enumeration idea.
I think the ideas also apply to real-world systems—such as humans. Complex values do allow more scope for overriding Omohundro’s drives, but they still seem to show through. Another major force acting on real world systems is natural selection. The behavior we see is the result of a combination of selective forces and self-organisation dynamics that arise from within the systems.
In the case of chess programs, the argument is simply false. Chess programs do not in fact exhibit anything remotely resembling the described behavior, nor would they do so even if given infinite computing power. This despite the fact that they exhibit extremely high performance (playing chess better than any human) and do indeed have a simple goal.
Chess programs are kind of a misleading example here, mostly because they’re a classic narrow-AI problem where the usual approach amounts to a dumb search of the game’s future configurations with some clever pruning. Such a program will never take initiative to to acquire unusual resources, make copies of itself, or otherwise behave alarmingly—it doesn’t have the cognitive scope to do so.
That isn’t necessarily true for a goal-directed general AI system whose goal is to play chess. I’d be a little more cautious than Omohundro in my assessment, since an AI’s potential for growth is going to be a lot more limited if its sensory universe consists of the chess game (my advisor in college took pretty much that approach with some success, although his system wasn’t powerful enough to approach AGI). But the difference isn’t one of goals, it’s one of architecture: the more cognitively flexible an AI is and the broader its sensory universe, the more likely it is that it’ll end up taking unintended pathways to reach its goal.
The idea is that they are given a lot of intelligence. In that case, it isn’t clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators—and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built—but arguably an agent with a wired-in world model has limited intelligence—since they can’t solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
That’s certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have “winning at chess” as a utility function—they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.
Current chess playing computers are not very intelligent—since a lot of definitions of intelligence require generality. Omohundro’s drives can be expected in intelligent systems—i.e. ones which are general.
With just a powerful optimisation process targetted at a single problem, I expect the described outcome would be less likely to occur spontaneously.
I would be inclined to agree that Omohundro fluffs this point in the initial section of his paper. It is not a critique of his paper that I have seen before, Nontheless, I think that there is still an underlying idea that is defensible—provided that “sufficiently powerful” is taken to imply general intelligence.
Of course, in the case of a narrow machinem in practice, there would still be the issue of surrounding humans finding a way to harness its power to do other useful work.