The conclusion I’d draw from this essay is that one can’t necessarily derive a “goal” or a “utility function” from all possible behavior patterns. If you ask “What is the robot’s goal?”, the answer is, “it doesn’t have one,” because it doesn’t assign a total preference ordering to states of the world. At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON’T SHOOT]. But that’s all.
This has some implications for AI, I think. First of all, not every computer program has a goal or a utility function. There is no danger that your TurboTax software will take over the world and destroy all human life, because it doesn’t have a general goal to maximize the number of completed tax forms. Even rather sophisticated algorithms can completely lack goals of this kind—they aren’t designed to maximize some variable over all possible states of the universe. It seems that the narrative of unfriendly AI is only a risk if an AI were to have a true goal function, and many useful advances in artificial intelligence (defined in the broad sense) carry no risk of this kind.
Do humans have goals? I don’t know; it’s plausible that we have goals that are complex and hard to define succinctly, and it’s also plausible that we don’t have goals at all, just sets of instructions like “SHOOT AT BLUE.” The test would seem to be if a human goal of “PROMOTE VALUE X” continues to imply behaviors in strange and unfamiliar circumstances, or if we only have rules of behavior in a few common situations. If you can think clearly about ethics (or preferences) in the far future, or the distant past, or regarding unfamiliar kinds of beings, and your opinions have some consistency, then maybe those ethical beliefs or preferences are goals. But probably many kinds of human behavior are more like sets of instructions than goals.
At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON’T SHOOT]. But that’s all.
No; placing a blue-tinted mirror in front of him will have him shoot himself even though that greatly diminishes his future ability to shoot. Generally a generic program really can’t be assigned any nontrivial utility function.
Destroying the robot greatly diminishes its future ability to shoot, but it would also greatly diminishes its future ability to see blue. The robot doesn’t prefer ‘shooting blue’ to ‘not shooting blue’, it prefers ‘seeing blue and shooting’ to ‘seeing blue and not shooting’.
It seems that the narrative of unfriendly AI is only a risk if an AI were to have a true goal function, and many useful advances in artificial intelligence (defined in the broad sense) carry no risk of this kind.
What does it mean for a program to have intelligence if it does not have a goal? (or have components that have goals)
The point of any incremental intelligence increase is to let the program make more choices, and perhaps choices at higher levels of abstraction. Even at low intelligence levels, the AI will only ‘do a good job’ if the basis of those choices adequately matches the basis we would use to make the same choice. (a close match at some level of abstraction below the choice, not the substrate and not basic algorithms)
Creating ‘goal-less’ AI still has the machine making more choices for more complex reasons, and allows for non-obvious mismatches between what it does and what we intended it to do.
Yes, you can look at paperclip-manufacturing software and see that it is not a paper-clipper, but some component might still be optimizing for something else entirely. We can reject the anthropomorphically obvious goal and there can still be an powerful optimization process that affects the total system, at the expense of both human values and produced paperclips.
Consider automatic document translation. Making the translator more complex and more accurate doesn’t imbue it with goals. It might easily be the case that in a few years, we achieve near-human accuracy at automatic document translation without major breakthroughs in any other area of AI research.
Making it more accurate is not the same as making it more intelligent. The question is: How does making something “more intelligent” change the nature of the inaccuracies? In translation especially there can be a bias without any real inaccuracy .
Goallessness at the level of the program is not what makes translators safe. They are safe because neither they nor any component is intelligent.
Most professional computer scientists and programmers I know routinely talk about “smart”, “dumb”, “intelligent” etc algorithms. In context, a smarter algorithm exploits more properties of the input or the problem. I think this is a reasonable use of language, and it’s the one I had in mind.
(I am open to using some other definition of algorithmic intelligence, if you care to supply one.)
I don’t see why making an algorithm smarter or more general would make it dangerous, so long as it stays fundamentally a (non-self-modifying) translation algorithm. There certainly will be biases in a smart algorithm. But dumb algorithms and humans have biases too.
I generally go with cross domain optimization power. http://wiki.lesswrong.com/wiki/Optimization_process Note that optimization target is not the same thing as a goal, and the process doesn’t need to exist within obvious boundaries. Evolution is goalless and disembodied.
If an algorithm is smart because a programmer has encoded everything that needs to be known to solve a problem, great. That probably reduces potential for error, especially in well-defined environments. This is not what’s going on in translation programs, or even the voting system here. (based on reddit) As systems like this creep up in complexity, their errors and biases become more subtle. (especially since we ‘fix’ them so that they usually work well) If an algorithm happens to be powerful in multiple domains, then the errors themselves might be optimized for something entirely different, and perhaps unrecognizable.
By your definition I would tend to agree that they are not dangerous, so long as their generalized capabilities are below human level, (seems to be the case for everything so far) with some complex caveats. For example ‘non-self-modifying’ is a likely false sense of security. If an AI has access to a medium which can be used to do computations, and the AI is good at making algorithms, then it could (Edit: It could build a powerful if not superintelligent program.)
Also, my concern in this thread has never been about the translation algorithm, the tax program, or even the paperclipper. It’s about some sub-process which happens to be a powerful optimizer. (in a hypothetical situation where we do more AI research on the premise that it is safe if it is in a goalless program.
What does it mean for a program to have intelligence if it does not have a goal?
This is a very interesting question, thanks for making me think about it.
(Based on your other comments elsewhere in this thread), it seems like you and I are in agreement that intelligence is about having the capability to make better choices. That is, two agents given an identical problem and identical resources to work with, the agent that is more intelligent is more likely to make the “better” choice.
What does “better” mean here? We need to define some sort of goal and then compare the outcome of their choices and how closely those outcome matches those goals. I have a couple of disorganized thoughts here:
The goal is just necessary for us, outsiders, to compare the intelligence of the two agents. The goal is not necessary for the existence of intelligence in the agents if no one’s interested in measuring their intelligence.
Assuming the agents are cooperative, you can temporarily assign subgoals. For example, perhaps you and I would like to know which one of us is smarter. You and I might have many different goals, but we might agree to temporarily take on a similar goal (e.g. win this game of chess, or get the highest amount of correct answers on this IQ test, etc.) so that our intelligence can be compared.
The “assigning” of goals to an intelligence strongly implies to me that goals are orthogonal to intelligence. Intelligence is the capability to fulfil any general goal, and it’s possible for someone to be intelligent even if they do not (currently, or ever) have any goals. If we come up with a new trait called Sodadrinkability which is the capability to drink a given soda, one can say that I possess Sodadrinkability—that I am capable of drinking a wide range of possible sodas provided to me—even if I do not currently (or ever) have any sodas to drink.
Let me suggest that the difference between goal-less behavior and goal-driven behavior, is that goal-driven behavior seeks means to attain its end. The means will vary with circumstances, while the end remains relatively invariant. Another indication of goal-driven behavior is that means are often prepared in anticipation of need, rather than in response to present need.
I said “relatively invariant” because goals can be and often are heirarical. An example was outlined by Maslow in his “A Theory of Human Motivation” in the Psychological Review (1943). Maslow aside, in problem solving, we often resort to staged solutions in which the means to a higher order goal become a new sub-goal and so on iteratively—until we reach low level goals within our immediate grasp.
A second point is that terms such as “purposeful” and “goal-seeking” are analogously predicated. To be analogousley predicated, a term is applied to differnt cases with a meaning that is partly the same and partly different. Thus, a goal-seeking robot is not goal-seeking because it intends any goals of its own, but because it is the vehicle by which designer seeks to effect his or her goals. In the parable, if the goal was the destruction of blue uniformed enemies, that goal was only intended by the robots creators. Since the robot is an instantiated means of attaining that gosl, we may speak, analogously, of it as having the same goal. The important point is that we mean different things in saying “the designer has a goal” and “the robot has a goal.” Each works toward same end (so the meaning iis partly the same), but only the designer intends that end (so the meaning iis partly different). (BTW this kind of analogy is an “analogy of attribution.”)
The fact that the robot is ineffective in attaining its end is a side issue that might be solved by employing better algoritms (edge and pattern recognition, etc.) There is no evidence that better algorithms will give the robot intentions in the sense that the designer has intentions.
The conclusion I’d draw from this essay is that one can’t necessarily derive a “goal” or a “utility function” from all possible behavior patterns. If you ask “What is the robot’s goal?”, the answer is, “it doesn’t have one,” because it doesn’t assign a total preference ordering to states of the world. At best, you could say that it prefers state [I SEE BLUE AND I SHOOT] to state [I SEE BLUE AND I DON’T SHOOT]. But that’s all.
This has some implications for AI, I think. First of all, not every computer program has a goal or a utility function. There is no danger that your TurboTax software will take over the world and destroy all human life, because it doesn’t have a general goal to maximize the number of completed tax forms. Even rather sophisticated algorithms can completely lack goals of this kind—they aren’t designed to maximize some variable over all possible states of the universe. It seems that the narrative of unfriendly AI is only a risk if an AI were to have a true goal function, and many useful advances in artificial intelligence (defined in the broad sense) carry no risk of this kind.
Do humans have goals? I don’t know; it’s plausible that we have goals that are complex and hard to define succinctly, and it’s also plausible that we don’t have goals at all, just sets of instructions like “SHOOT AT BLUE.” The test would seem to be if a human goal of “PROMOTE VALUE X” continues to imply behaviors in strange and unfamiliar circumstances, or if we only have rules of behavior in a few common situations. If you can think clearly about ethics (or preferences) in the far future, or the distant past, or regarding unfamiliar kinds of beings, and your opinions have some consistency, then maybe those ethical beliefs or preferences are goals. But probably many kinds of human behavior are more like sets of instructions than goals.
No; placing a blue-tinted mirror in front of him will have him shoot himself even though that greatly diminishes his future ability to shoot. Generally a generic program really can’t be assigned any nontrivial utility function.
Destroying the robot greatly diminishes its future ability to shoot, but it would also greatly diminishes its future ability to see blue. The robot doesn’t prefer ‘shooting blue’ to ‘not shooting blue’, it prefers ‘seeing blue and shooting’ to ‘seeing blue and not shooting’.
So the original poster was right.
Edit: I’m wrong, see below
If the robot knows that its camera is indestructible but its gun isn’t, it would still shoot at the mirror and destroy only its gun.
So it would be [I SEE BLUE AND I TRY TO SHOOT].
… except that it wouldn’t mind if shooting itself damaged its own program so that it wouldn’t even try to shoot if it saw blue anymore.
Ok, I am inclined to agree that its behaviour can’t be described in terms of goals.
This is a very awesome post. Thumbs up.
What does it mean for a program to have intelligence if it does not have a goal? (or have components that have goals)
The point of any incremental intelligence increase is to let the program make more choices, and perhaps choices at higher levels of abstraction. Even at low intelligence levels, the AI will only ‘do a good job’ if the basis of those choices adequately matches the basis we would use to make the same choice. (a close match at some level of abstraction below the choice, not the substrate and not basic algorithms)
Creating ‘goal-less’ AI still has the machine making more choices for more complex reasons, and allows for non-obvious mismatches between what it does and what we intended it to do.
Yes, you can look at paperclip-manufacturing software and see that it is not a paper-clipper, but some component might still be optimizing for something else entirely. We can reject the anthropomorphically obvious goal and there can still be an powerful optimization process that affects the total system, at the expense of both human values and produced paperclips.
Consider automatic document translation. Making the translator more complex and more accurate doesn’t imbue it with goals. It might easily be the case that in a few years, we achieve near-human accuracy at automatic document translation without major breakthroughs in any other area of AI research.
Making it more accurate is not the same as making it more intelligent. The question is: How does making something “more intelligent” change the nature of the inaccuracies? In translation especially there can be a bias without any real inaccuracy .
Goallessness at the level of the program is not what makes translators safe. They are safe because neither they nor any component is intelligent.
Most professional computer scientists and programmers I know routinely talk about “smart”, “dumb”, “intelligent” etc algorithms. In context, a smarter algorithm exploits more properties of the input or the problem. I think this is a reasonable use of language, and it’s the one I had in mind.
(I am open to using some other definition of algorithmic intelligence, if you care to supply one.)
I don’t see why making an algorithm smarter or more general would make it dangerous, so long as it stays fundamentally a (non-self-modifying) translation algorithm. There certainly will be biases in a smart algorithm. But dumb algorithms and humans have biases too.
I generally go with cross domain optimization power. http://wiki.lesswrong.com/wiki/Optimization_process Note that optimization target is not the same thing as a goal, and the process doesn’t need to exist within obvious boundaries. Evolution is goalless and disembodied.
If an algorithm is smart because a programmer has encoded everything that needs to be known to solve a problem, great. That probably reduces potential for error, especially in well-defined environments. This is not what’s going on in translation programs, or even the voting system here. (based on reddit) As systems like this creep up in complexity, their errors and biases become more subtle. (especially since we ‘fix’ them so that they usually work well) If an algorithm happens to be powerful in multiple domains, then the errors themselves might be optimized for something entirely different, and perhaps unrecognizable.
By your definition I would tend to agree that they are not dangerous, so long as their generalized capabilities are below human level, (seems to be the case for everything so far) with some complex caveats. For example ‘non-self-modifying’ is a likely false sense of security. If an AI has access to a medium which can be used to do computations, and the AI is good at making algorithms, then it could (Edit: It could build a powerful if not superintelligent program.)
Also, my concern in this thread has never been about the translation algorithm, the tax program, or even the paperclipper. It’s about some sub-process which happens to be a powerful optimizer. (in a hypothetical situation where we do more AI research on the premise that it is safe if it is in a goalless program.
This is a very interesting question, thanks for making me think about it.
(Based on your other comments elsewhere in this thread), it seems like you and I are in agreement that intelligence is about having the capability to make better choices. That is, two agents given an identical problem and identical resources to work with, the agent that is more intelligent is more likely to make the “better” choice.
What does “better” mean here? We need to define some sort of goal and then compare the outcome of their choices and how closely those outcome matches those goals. I have a couple of disorganized thoughts here:
The goal is just necessary for us, outsiders, to compare the intelligence of the two agents. The goal is not necessary for the existence of intelligence in the agents if no one’s interested in measuring their intelligence.
Assuming the agents are cooperative, you can temporarily assign subgoals. For example, perhaps you and I would like to know which one of us is smarter. You and I might have many different goals, but we might agree to temporarily take on a similar goal (e.g. win this game of chess, or get the highest amount of correct answers on this IQ test, etc.) so that our intelligence can be compared.
The “assigning” of goals to an intelligence strongly implies to me that goals are orthogonal to intelligence. Intelligence is the capability to fulfil any general goal, and it’s possible for someone to be intelligent even if they do not (currently, or ever) have any goals. If we come up with a new trait called Sodadrinkability which is the capability to drink a given soda, one can say that I possess Sodadrinkability—that I am capable of drinking a wide range of possible sodas provided to me—even if I do not currently (or ever) have any sodas to drink.
Let me suggest that the difference between goal-less behavior and goal-driven behavior, is that goal-driven behavior seeks means to attain its end. The means will vary with circumstances, while the end remains relatively invariant. Another indication of goal-driven behavior is that means are often prepared in anticipation of need, rather than in response to present need.
I said “relatively invariant” because goals can be and often are heirarical. An example was outlined by Maslow in his “A Theory of Human Motivation” in the Psychological Review (1943). Maslow aside, in problem solving, we often resort to staged solutions in which the means to a higher order goal become a new sub-goal and so on iteratively—until we reach low level goals within our immediate grasp.
A second point is that terms such as “purposeful” and “goal-seeking” are analogously predicated. To be analogousley predicated, a term is applied to differnt cases with a meaning that is partly the same and partly different. Thus, a goal-seeking robot is not goal-seeking because it intends any goals of its own, but because it is the vehicle by which designer seeks to effect his or her goals. In the parable, if the goal was the destruction of blue uniformed enemies, that goal was only intended by the robots creators. Since the robot is an instantiated means of attaining that gosl, we may speak, analogously, of it as having the same goal. The important point is that we mean different things in saying “the designer has a goal” and “the robot has a goal.” Each works toward same end (so the meaning iis partly the same), but only the designer intends that end (so the meaning iis partly different). (BTW this kind of analogy is an “analogy of attribution.”)
The fact that the robot is ineffective in attaining its end is a side issue that might be solved by employing better algoritms (edge and pattern recognition, etc.) There is no evidence that better algorithms will give the robot intentions in the sense that the designer has intentions.