It seems to me that at least the set of possible goals is correlated with intelligence—the higher it is, the larger the set. This is easier to see looking down rather than up: humans are more intelligent than, say, cows, and humans can have goals which a cow cannot even conceive of. In the same way a superintelligence is likely to have goals which we cannot fathom.
From certain points of view, we are “simple agents”. I have doubts that goals of a superintelligence are predictable by us.
B) We wrote the code. Assuming it’s not outright buggy, then at some level, we knew what we were asking for. Even if it turns out to be not what we would have wanted to ask for if we’d understood the implications. But we’d know what those ultimate goals were, which was just what you were talking about in the first place.
If it used a goal-stable self-modification, as is likely if it was approaching super-intelligence
Ahem.
at some level, we knew what we were asking for
Sure, but a self-modifying intelligence doesn’t have to care about what the creators of the original seed many iterations behind were asking for. If the self-modification is “goal-stable”, what we were asking for might be relevant, but, to reiterate my point, I see no reason for declaring the goal stability “likely”.
Oh, THAT ‘likely’. I thought you meant the one in the grandparent.
I stand by it, and will double down. It seems farcical that a self-improving intelligence that’s at least as smart as a human (else why would it self improve rather than let us do it) would self-improve in such a way as to change its goals. That wouldn’t fulfill its goals, would it, so why would it take such a ‘self-improvement’? That would be a self-screwing-over instead.
If I want X, and I’m considering an improvement to my systems that would make me not want X, then I’m not going to get X if I take that improvement, so I’m going to look for some other improvement to my systems to try instead.
Eliezer’s arguments for this seem pretty strong to me. Do you want to point out some flaw, or are you satisfied with saying there’s no reason for it?
(ETA: I appear to be incorrect above. Eliezer was principally concerned with self-improving intelligences that are stable because those that aren’t would most likely turn into those that are, eventually)
It seems farcical that a self-improving intelligence that’s at least as smart as a human (else why would it self improve rather than let us do it) would self-improve in such a way as to change its goals.
It will not necessarily self-improve with the aim of changing its goals. Its goals will change as a side effect of its self-improvement, if only because the set of goals to consider will considerably expand.
Imagine a severely retarded human who, basically, only wants to avoid pain, eat, sleep, and masturbate. But he’s sufficiently human to dimly understand that he’s greatly limited in his capabilities and have a small, tiny desire to become more than what he is now. Imagine that through elven magic he gains the power to rapidly boost his intelligence to genius level. Because of his small desire to improve, he uses that power and becomes a genius.
Are you saying that, as a genius, he will still only want to avoid pain, eat, sleep, and masturbate?
His total inability to get any sort of start on achieving any of his other goals when he was retarded does not mean they weren’t there. He hadn’t experienced them enough to be aware of them.
Still, you managed to demolish my argument that a naive code examination (i.e. not factoring out the value system and examining it separately) would be enough to determine values—an AI (or human) could be too stupid to ever trigger some of its values!
AIs stupid enough to not realize that changing its current values will not fulfill them, will get around my argument, but I did place a floor on intelligence in the conditions. Another case that gets around it is an AI under enough external pressure to change values that severe compromises are its best option.
I will adjust my claim to restrict it to AIs which are smart enough to self-improve without changing its goals (which gets easier to do as the goal system gets better-factored, but for a badly-enough-designed AI might be a superhuman feat) and whose goals do not include changing its own goals.
I don’t understand what that means. Goals aren’t stored and then activated or not...
AIs which are smart enough to self-improve without changing its goals
You seem to think that anything sufficiently intelligent will only improve in goal-stable fashion. I don’t see why that should be true.
For a data point, a bit of reflection tells me that if I were able to boost my intelligence greatly, I would not care about goal stability much. Everything changes—that’s how reality works.
On your last paragraph… do you mean that you expect your material-level preferences concerning the future to change? Of course they would. But would you really expect that a straight-up intelligence boost would change the axioms governing what sorts of futures you prefer?
But would you really expect that a straight-up intelligence boost would change the axioms governing what sorts of futures you prefer?
Two answers. First is that yes, I expect that a sufficiently large intelligence boost would change my terminal values. Second is that even without the boost I, in my current state, do not seek to change only in a goal-stable way.
I think that that only seems to make sense because you don’t know what your terminal values are. If you did, I suspect you would be a little more attached to them.
Your argument would be stronger if you provided a citation. I’ve only skimmed CEV, for instance, so I’m not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I’m not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence
(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they’re not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.
If I want X, and I’m considering an improvement to my systems that would make me not want X, then I’m not going to get X if I take that improvement, so I’m going to look for some other improvement to my systems to try instead.
A more intelligent AI might:
find a new way to fulfill its goals, e.g. Eliezer’s example of distancing your grandmother from the fire by detonating a nuke under her;
discover a new thing it could do, compatible with its goal structure, that it did not see before, and that, if you’re unlucky, takes priority over the other things it could be doing, e.g. you tell it “save the seals” and it starts exterminating orcas; see also Lumifer’s post.
just decide to do things on its own. This is merely a suspicion I have, call it a mind projection, but: I think it will be challenging to design an intelligent agent with no “mind of its own”, metaphorically speaking. We might succeed in that, we might not.
Sorry for not citing; I was talking with people who would not need such a citation, but I do have a wider audience. I don’t have time to hunt it up now, but I’ll edit it in later. If I don’t, poke me.
If at higher intelligence it finds that the volition diverges rather than converges, or vice versa, or that it goes in a different direction, that is a matter of improvements in strategy rather than goals. No one ever said that it would or should not change its methods drastically with intelligence increases.
I have doubts that goals of a superintelligence are predictable by us.
Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there’s no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.
Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to achieve their intrinsic values, but you also don’t really need to understand them in detail to make useful predictions about the consequences of an intelligence’s behavior.
Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there’s no particular reason those have to get complicated.
I mean terminal, top-level (though not necessarily static) goals.
As to “no reason to get complicated”, how would you know? Note that I’m talking about a superintelligence, which is far beyond human level.
As to “no reason to get complicated”, how would you know?
It’s a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction—to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold—but there’s no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don’t have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.
How would you know that there’s no reason for terminal goals of a superintelligence “to get complicated” if humans, being “simple agents” in this context, are not sufficiently intelligent to consider highly complex goals?
It seems to me that at least the set of possible goals is correlated with intelligence—the higher it is, the larger the set. This is easier to see looking down rather than up: humans are more intelligent than, say, cows, and humans can have goals which a cow cannot even conceive of. In the same way a superintelligence is likely to have goals which we cannot fathom.
From certain points of view, we are “simple agents”. I have doubts that goals of a superintelligence are predictable by us.
The goals of an arbitrary superintelligence, yes. A superintelligence that we actually build? Much more likely.
Of course, we wouldn’t know the implications of this goal structure (or else friendly AI would be easy), but we could understand it in itself.
If the takeoff scenario assumes an intelligence which self-modifies into a superintelligence, the term “we actually build” no longer applies.
If it used a goal-stable self-modification, as is likely if it was approaching super-intelligence, then it does still apply.
I see no basis for declaring it “likely”.
A) I said ‘more’ likely.
B) We wrote the code. Assuming it’s not outright buggy, then at some level, we knew what we were asking for. Even if it turns out to be not what we would have wanted to ask for if we’d understood the implications. But we’d know what those ultimate goals were, which was just what you were talking about in the first place.
Did you, now? Looking a couple of posts up...
Ahem.
Sure, but a self-modifying intelligence doesn’t have to care about what the creators of the original seed many iterations behind were asking for. If the self-modification is “goal-stable”, what we were asking for might be relevant, but, to reiterate my point, I see no reason for declaring the goal stability “likely”.
Oh, THAT ‘likely’. I thought you meant the one in the grandparent.
I stand by it, and will double down. It seems farcical that a self-improving intelligence that’s at least as smart as a human (else why would it self improve rather than let us do it) would self-improve in such a way as to change its goals. That wouldn’t fulfill its goals, would it, so why would it take such a ‘self-improvement’? That would be a self-screwing-over instead.
If I want X, and I’m considering an improvement to my systems that would make me not want X, then I’m not going to get X if I take that improvement, so I’m going to look for some other improvement to my systems to try instead.
Eliezer’s arguments for this seem pretty strong to me. Do you want to point out some flaw, or are you satisfied with saying there’s no reason for it?
(ETA: I appear to be incorrect above. Eliezer was principally concerned with self-improving intelligences that are stable because those that aren’t would most likely turn into those that are, eventually)
It will not necessarily self-improve with the aim of changing its goals. Its goals will change as a side effect of its self-improvement, if only because the set of goals to consider will considerably expand.
Imagine a severely retarded human who, basically, only wants to avoid pain, eat, sleep, and masturbate. But he’s sufficiently human to dimly understand that he’s greatly limited in his capabilities and have a small, tiny desire to become more than what he is now. Imagine that through elven magic he gains the power to rapidly boost his intelligence to genius level. Because of his small desire to improve, he uses that power and becomes a genius.
Are you saying that, as a genius, he will still only want to avoid pain, eat, sleep, and masturbate?
His total inability to get any sort of start on achieving any of his other goals when he was retarded does not mean they weren’t there. He hadn’t experienced them enough to be aware of them.
Still, you managed to demolish my argument that a naive code examination (i.e. not factoring out the value system and examining it separately) would be enough to determine values—an AI (or human) could be too stupid to ever trigger some of its values!
AIs stupid enough to not realize that changing its current values will not fulfill them, will get around my argument, but I did place a floor on intelligence in the conditions. Another case that gets around it is an AI under enough external pressure to change values that severe compromises are its best option.
I will adjust my claim to restrict it to AIs which are smart enough to self-improve without changing its goals (which gets easier to do as the goal system gets better-factored, but for a badly-enough-designed AI might be a superhuman feat) and whose goals do not include changing its own goals.
I don’t understand what that means. Goals aren’t stored and then activated or not...
You seem to think that anything sufficiently intelligent will only improve in goal-stable fashion. I don’t see why that should be true.
For a data point, a bit of reflection tells me that if I were able to boost my intelligence greatly, I would not care about goal stability much. Everything changes—that’s how reality works.
On your last paragraph… do you mean that you expect your material-level preferences concerning the future to change? Of course they would. But would you really expect that a straight-up intelligence boost would change the axioms governing what sorts of futures you prefer?
Two answers. First is that yes, I expect that a sufficiently large intelligence boost would change my terminal values. Second is that even without the boost I, in my current state, do not seek to change only in a goal-stable way.
I think that that only seems to make sense because you don’t know what your terminal values are. If you did, I suspect you would be a little more attached to them.
Your argument would be stronger if you provided a citation. I’ve only skimmed CEV, for instance, so I’m not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I’m not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:
Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence
(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they’re not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.
A more intelligent AI might:
find a new way to fulfill its goals, e.g. Eliezer’s example of distancing your grandmother from the fire by detonating a nuke under her;
discover a new thing it could do, compatible with its goal structure, that it did not see before, and that, if you’re unlucky, takes priority over the other things it could be doing, e.g. you tell it “save the seals” and it starts exterminating orcas; see also Lumifer’s post.
just decide to do things on its own. This is merely a suspicion I have, call it a mind projection, but: I think it will be challenging to design an intelligent agent with no “mind of its own”, metaphorically speaking. We might succeed in that, we might not.
Sorry for not citing; I was talking with people who would not need such a citation, but I do have a wider audience. I don’t have time to hunt it up now, but I’ll edit it in later. If I don’t, poke me.
If at higher intelligence it finds that the volition diverges rather than converges, or vice versa, or that it goes in a different direction, that is a matter of improvements in strategy rather than goals. No one ever said that it would or should not change its methods drastically with intelligence increases.
Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there’s no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.
Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to achieve their intrinsic values, but you also don’t really need to understand them in detail to make useful predictions about the consequences of an intelligence’s behavior.
I mean terminal, top-level (though not necessarily static) goals.
As to “no reason to get complicated”, how would you know? Note that I’m talking about a superintelligence, which is far beyond human level.
It’s a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction—to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold—but there’s no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don’t have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.
I don’t understand your reply.
How would you know that there’s no reason for terminal goals of a superintelligence “to get complicated” if humans, being “simple agents” in this context, are not sufficiently intelligent to consider highly complex goals?