They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
I don’t think any goal system that spits out a real number indicating relative preference can be called “conflictin, g”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing.
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
An AI with multiple conflicting goals sounds incoherent
Well humans exist despite having multiple conflicting goals.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.
They aren’t guaranteed to be immutable. It is merely the case that any agent that wants to optimize the world for some set of goals does not serve its objective by creating a more powerful agent with different goals. An AI with multiple conflicting goals sounds incoherent—do you mean a weighted average? The AI has to have some way to evaluate, numerically, its preference of one future over another, and I don’t think any goal system that spits out a real number indicating relative preference can be called “conflicting”. If an action gives points on one goal and detracts on another, the AI will simply form a weighted mix and evaluate whether it’s worth doing. I cannot imagine an AI architecture that allows genuine internal conflict. Not even humans have that. I suspect it’s an incoherent concept. Do you mean the feeling of conflict that, in humans, arises by choice between options that satisfy different drives? There’s no reason an AI could not be programmed to “feel” this way, though what good it would do I cannot imagine. Nonetheless, at the end of the day, for any coherent agent, you can see whether its goal system has spit out “action worth > 0” or “action worth < 0″ simply by whether it takes the action or not.
The AI’s terminal goals are not guaranteed to be immutable. It is merely guaranteed that the AI will do its utmost to keep them unchanged, because that’s what terminal goals are. If it could desire to mutate them, then whatever was being mutated was not a terminal goal of the AI. The AI’s goals are the thing that determine the relative value of one future over another; I submit that an AI that values one thing but pointlessly acts to bring about a future that contains a powerful optimizer who doesn’t value that thing, is so ineffectual as to be almost unworthy of the term intelligence.
Could you build an AI like that? Sure, just take a planetary-size supercomputer and a few million years of random, scattershot exploratory programming.. but why would you?
It’s possible, though unlikely unless a situation is artificially constructed that the two mutually exclusive top-rated choices can have exactly equal utility. Of more practical concern, if the preference evaluation has uncertainty it’s possible for the utility-range of the top two choices to overlap, in which case the AI may need to take meta-actions to resolve that uncertainty before choosing which action to take to reach its goal.
Well humans exist despite having multiple conflicting goals.
At this point, it’s not clear that the concept of “terminal goals” refers to anything in the territory.