This looks like a very fragile argument to me. Consider multiple conflicting goals. Consider vague general goals (e.g. “explore”) with a mutating set of subgoals. Consider a non-teleological AI.
You assume that in the changeable self-modifying (and possibly other-modifying as well) AI there will be an island of absolute stability and constancy—the immutable goals. I don’t see why they are guaranteed to be immutable.
I cannot understand why any of these would cause an AI to change their goals.
My best guess at your argument is that you are referring to something different from the consensus use of the word ‘goals’ here. Most of the people debating you are using goals to refer to terminal values, not instrumental ones. (‘Goal’ is somewhat misleading here; ‘value’ might be more accurate.)
The concept is sound, I think. Take an extreme example, such as the Gandhi’s pill thought experiment:
If you offered Gandhi a pill that made him want to kill people, he would refuse to take it, because he knows that then he would kill people, and the current Gandhi doesn’t want to kill people. This, roughly speaking, is an argument that minds sufficiently advanced to precisely modify and improve themselves, will tend to preserve the motivational framework they started in. (emphasis mine)
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don’t see how you can guarantee that some new information won’t cause them to mutate again.
Yes, but the focus is on an already competent AI. It would never willingly or knowingly change its goals from its original ones, given that it improves itself smartly, and was initially programmed with (at least) that level of reflective smartness.
Goals are static. The AI may refine its goals given the appropriate information, if its goals are programmed in such a way to allow it, but it wont drastically alter them in any functional way.
An appropriate metaphor would be physics. The laws of physics are the same, and have been the same since the creation of the universe. Our information about what they are, however, hasn’t been. Isaac newton had a working model of physics, but it wasn’t perfect. It let us get the right answer (mostly), but then Einstein discovered Relativity. (The important thing to remember here is that physics itself did not change.) All the experiments used to support Newtonian physics got the same amount of support from Relativity. Relativity, however, got much more accurate answers for more extreme phenomena unexplained by Newton.
The AI can be programmed with Newton, and do good enough. However, given the explicit understanding of how we got to Newton in the first place (i.e. the scientific method), it can upgrade itself to Relativity when it realizes we were a bit off. That should be the extent to which an AI purposefully alters its goal.
It would never willingly or knowingly change its goals from its original ones … Goals are static.
AIs of the required caliber do not exist (yet). Therefore we cannot see the territory, all we are doing is using our imagination to draw maps which may or may not resemble the future territory.
These maps (or models) are based on certain assumptions. In this particular case your map assumes that AI goals are immutable. That is an assumption of this particular map/model, it does not derive from any empirical reality.
If you want to argue that in your map/model of an AI the goals are immutable, fine. However they are immutable because you assumed them so and for no other reason.
If you want to argue that in reality the AI’s goals are immutable because there is a law of nature or logic or something else that requires it—show me the law.
Long before goal mutation is a problem malformed constraints become a problem. Consider a thought experiment: Someone offers to pay you 100 dollars when a wheelbarrow is full of water from a nearby lake, and provides you with the wheelbarrow and a teaspoon. Before you have to worry about people deciding they don’t care about 100 dollars, you need to decide how to keep them from just pushing the wheelbarrow into the lake.
Long before goal mutation is a problem malformed constraints become a problem.
True. But we are not arguing about what is a bigger (or earlier) problem. I’m being told that an AI can not, absolutely can NOT change its original goals (or terminal values). And that looks very handwavy to me.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
I don’t think any goal system that spits out a real number indicating relative preference can be called “conflictin, g”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing.
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
An AI with multiple conflicting goals sounds incoherent
Well humans exist despite having multiple conflicting goals.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.
This looks like a very fragile argument to me. Consider multiple conflicting goals. Consider vague general goals (e.g. “explore”) with a mutating set of subgoals. Consider a non-teleological AI.
You assume that in the changeable self-modifying (and possibly other-modifying as well) AI there will be an island of absolute stability and constancy—the immutable goals. I don’t see why they are guaranteed to be immutable.
I cannot understand why any of these would cause an AI to change their goals.
My best guess at your argument is that you are referring to something different from the consensus use of the word ‘goals’ here. Most of the people debating you are using goals to refer to terminal values, not instrumental ones. (‘Goal’ is somewhat misleading here; ‘value’ might be more accurate.)
Nah, I’m fine with replacing “goals” with “terminal values” in my argument.
I still see no law of nature or logic that would prevent an AI from changing its terminal values as it develops.
The concept is sound, I think. Take an extreme example, such as the Gandhi’s pill thought experiment:
While it may be imperfect preservation while still stupid, or contain globular, fuzzy definitions of goals, an adequately powerful self improving AI should eventually reach a state of static, well defined goals permanently.
First, this is radically different from the claim that an AI has to forever stick with its original goals.
Second, that would be true only under the assumption of no new information becoming available to an AI, ever. Once we accept that goals mutate, I don’t see how you can guarantee that some new information won’t cause them to mutate again.
Yes, but the focus is on an already competent AI. It would never willingly or knowingly change its goals from its original ones, given that it improves itself smartly, and was initially programmed with (at least) that level of reflective smartness.
Goals are static. The AI may refine its goals given the appropriate information, if its goals are programmed in such a way to allow it, but it wont drastically alter them in any functional way.
An appropriate metaphor would be physics. The laws of physics are the same, and have been the same since the creation of the universe. Our information about what they are, however, hasn’t been. Isaac newton had a working model of physics, but it wasn’t perfect. It let us get the right answer (mostly), but then Einstein discovered Relativity. (The important thing to remember here is that physics itself did not change.) All the experiments used to support Newtonian physics got the same amount of support from Relativity. Relativity, however, got much more accurate answers for more extreme phenomena unexplained by Newton.
The AI can be programmed with Newton, and do good enough. However, given the explicit understanding of how we got to Newton in the first place (i.e. the scientific method), it can upgrade itself to Relativity when it realizes we were a bit off. That should be the extent to which an AI purposefully alters its goal.
AIs of the required caliber do not exist (yet). Therefore we cannot see the territory, all we are doing is using our imagination to draw maps which may or may not resemble the future territory.
These maps (or models) are based on certain assumptions. In this particular case your map assumes that AI goals are immutable. That is an assumption of this particular map/model, it does not derive from any empirical reality.
If you want to argue that in your map/model of an AI the goals are immutable, fine. However they are immutable because you assumed them so and for no other reason.
If you want to argue that in reality the AI’s goals are immutable because there is a law of nature or logic or something else that requires it—show me the law.
Long before goal mutation is a problem malformed constraints become a problem. Consider a thought experiment: Someone offers to pay you 100 dollars when a wheelbarrow is full of water from a nearby lake, and provides you with the wheelbarrow and a teaspoon. Before you have to worry about people deciding they don’t care about 100 dollars, you need to decide how to keep them from just pushing the wheelbarrow into the lake.
True. But we are not arguing about what is a bigger (or earlier) problem. I’m being told that an AI can not, absolutely can NOT change its original goals (or terminal values). And that looks very handwavy to me.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
Well humans exist despite having multiple conflicting goals.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.