That depends on how rational Clippy is. A rational Clippy might realise there is a point where the suffering caused by clipping outweighs the pleasure it gets, objectively speaking.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering. Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it. (2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips. (3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising
there are better things to do.
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
To have maximal instrumental rationality, an entity would have to understand everything…
Why? In what situation is someone who empathetically understands, say, suffering better at minimizing it (or, indeed, maximizing paperclips) than an entity who can merely measure it and work out on a sheet of paper what would reduce the size of the measurements?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?”
The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this?
To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases.
Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular
knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All
the different kinds of “better” blend into each other.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage
in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated
to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy
might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
If there exists an “objective”(1) ranking of the importance of the “pleasure”(2) Clippy gets vs the suffering Clippy causes, a “rational”(3) Clippy might indeed realize that the suffering caused by optimizing for paperclips “objectively”(1) outweighs that “pleasure”(2)… agreed. A sufficiently “rational”(3) Clippy might even prefer to forego maximizing paperclips altogether in favor of achieving more “objectively”(1) important goals.
By the same token, a Clippy who was unaware of that “objective”(1) ranking or who wasn’t adequately “rational”(3) might simply go on optimizing its environment for the things that give it “pleasure”(2).
As I understand it, the Orthogonality Thesis states in this context that no matter how intelligent Clippy is, and no matter how competent Clippy is at optimizing its environment for the things Clippy happens to value, Clippy is not necessarily “rational”(3) and is not necessarily motivated by “objective”(1) considerations. Is that consistent with your understanding of the Orthogonality Thesis, and if not, could you restate your understanding of it?
[Edited to add:] Reading some of your other comments, it seems you’re implicitly asserting that:
all agents sufficiently capable of optimizing their environment for a value are necessarily also “rational”(3), and
maximizing paperclips is “objectively”(1) less valuable than avoiding human suffering.
Have I understood you correctly?
============
(1) By which I infer that you mean in this context existing outside of Clippy’s mind (as well as potentially inside of it) but nevertheless relevant to Clippy, even if Clippy is not necessarily aware of it.
(2) By which I infer you mean in this context the satisfaction of whatever desires motivate Clippy, such as the existence of paper clips.
(3) By which I infer you mean in this context capable of taking “objective”(1) concerns into consideration in its thinking.
What I mean is epistemically objective, ie not a matter of personal whim. Whethere that requires anything to exist is another question.
There’s nothing objective about Clippy being concerned only with Clippy’s pleasure.
it’s uncontentious that relatively dumb and irratioanl clippies can carry on being clipping-obsessed. The questions is whether their intelligence and rationality can increase indefinitely without their ever realising there are better things to do.
I am not disputing what the Orthogonality thesis says. I dispute it;s truth. To have maximal instrumental rationality, an entity would have to understand everything...
Why would an entity that doesn’t empathically understand suffering be motivated to reduce it?
Perhaps its paperclipping machine is slowed down by suffering. But it doesn’t have to be reducing suffering, it could be sorting pebbles into correct heaps, or spreading Communism, or whatever. What I was trying to ask was, “In what way is the instrumental rationality of a being who empathizes with suffering better, or more maximal, than that of a being who does not?” The way I’ve seen it used, “instrumental rationality” refers to the ability to evaluate evidence to make predictions, and to choose optimal decisions, however they may be defined, based on those predictions. If my definition is sufficiently close to the one your own, then how does “understanding”, which I have taken, based on your previous posts, to mean “empathetic understanding”, maximize this? To put it yet another way, if we imagine two beings, M and N, such that M has “maximal instrumental rationality” and N has “Maximal instrumental rationality- empathetic understanding”, why does M have more instrumental rationality than N.
If Jane knows she will have a strong preference not to have a hangover tomorrow, but a more vivid and accessible desire to keep drinking with her friends in the here-and-now, she may yield to the weaker preference. By the same token, if Jane knows a cow has a strong preference not to have her throat slit, but Jane has a more vivid and accessible desire for a burger in-the-here-and-now, then she may again yield to the weaker preference. An ideal, perfectly rational agent would act to satisfy the stronger preference in both cases. Perfect empathy or an impartial capacity for systematic rule-following (“ceteris paribus, satisfy the stronger preference”) are different routes to maximal instrumental rationality; but the outcomes converge.
The two cases presented are not entirely comparable. If Jane’s utility function is “Maximize Jane’s pleasure” then she will choose to not drink in the first problem; the pleasure of non-hangover-having [FOR JANE] exceeding that of [JANE’S] intoxication. Whereas in the second problem Jane is choosing between the absence of a painful death [FOR A COW] and [JANE’S] delicious, juicy hamburger. Since she is not selecting for the strongest preference of every being in the Universe, but rather for herself, she will choose the burger. In terms of which utility function is more instrumentally rational, I’d say that “Maximize Jane’s Pleasure” is easier to fulfill than “Maximize Pleasure”, and is thus better at fulfilling itself. However, instrumentally rational beings, by my definition, are merely better at fulfilling whatever utility function is given, not at choosing a useful one.
GloriaSidorum, indeed, for evolutionary reasons we are predisposed to identify strongly with some here-and-nows, weakly with others, and not at all with the majority. Thus Jane believes she is rationally constrained to give strong weight to the preferences of her namesake and successor tomorrow; less weight to the preferences of her more distant namesake and successor thirty years hence; and negligible weight to the preferences of the unfortunate cow. But Jane is not an ideal rational agent. If instead she were a sophisticated ultraParifitan about personal (non)identity (cf. http://www.cultiv.net/cultranet/1151534363ulla-parfit.pdf ), or had internalised Nagel’s “view from nowhere”, then she would be less prey to such biases. Ideal epistemic rationality and ideal instrumental rationality are intimately linked. Our account of the nature of the world will profoundly shape our conception of idealised rational agency.
I guess a critic might respond that all that should be relevant to idealised instrumental rationality is an agent’s preferences now—in the so-called specious present. But the contents of a single here-and-bow would be an extraordinarily impoverished basis for any theory of idealised rational agency.
The question is the wrong one. An clipper can’t choose to only acquire knowledge or abilities that will be instrumentally useful, because it doesn’t know in advance what they are. It doesn’t have that kind of oracular knowledge. The only way way a clipper can increase its instrumental to the maximum possible is to exhaustively examine everything, and keep what is instrumentally useful. So a clipper will eventually need to examine qualia, since it cannot prove in advance that they will not be instrumentally useful, in some way, and it probably cant understand qualia without empahty: so the argument hinges issues like:
whether it is possible for an entity to understand “pain hurts” without understanding “hurting is bad”.
whether it is possble to back out of being empathic and go back to being in an empathic state
whether a clipper would hold back from certain self-modifications that might make it a better clipper or might cause it to loose interest in clipping.
The third is something of a real world issue. It is, for instance, possible for someone to study theology with a view to formulating better Christian apologetics, only to become convinced that here are no good arguments for Christianity.
(Edited for format)
Would it then need to acquire the knowledge that post-utopians experience colonial alienation? That heaps of 91 pebbles are incorrect? I think not. At most it would need to understand that “When pebbles are sorted into heaps of 91, pebble-sorters scatter those heaps” or “When I say that colonial alienation is caused by being a post-utopian, my professor reacts as though I had made a true statement.” or “When a human experiences certain phenomena, they try to avoid their continued experience”. These statements have predictive power. The reason that an instrumentally rational agent tries to acquire new information is to increase their predictive power. If human behavior can be modeled without empathy, then this agent can maximize its instrumental rationality while ignoring it. As to your last bullet point, if I may be so bold, I doubt you actually believe it. Having a rule like “Modify your utility function every time it might be useful” seems rather irrational. Most possible modifications to a clipper’s utility function will not have a positive effect, because most possible states of the world do not have maximal paperclips.
Try removing the space between the “[]” and the “()”.
Thanks! Eventually I’ll figure out the formatting on this site.
The Show Help button under the comment box provides helpful clues.
That’s a guess. As a cognitively-bounded agent, you are guessing. A superintelligence doesn’t have to guess. Superintelligence changes the game.
Knowing why some entity avoids some thing has more predictive power.
As opposed to all of those empirically-testable statements about idealized superintelligences
In what way?
Yes, we’re both guessing about superintelligences. Because we are both cognitively bounded. But it is a better guess that superintelligences themselves don’t have to guess because they are not congitvely bounded.
Knowing why has greater predictive power because it allows you to handle counterfactuals better.
That isn’t what I said at all. I think it is a quandary for a agent whether to gamble whether to play safe and miss out on a gain in effectiveness, or go for it and risk a change in values.
I’m sorry for misinterpreting. What evidence is there ( from the clippy SIs perspective) that maximizing happiness would produce more paperclips?
The argument is that the clipper needs to maximise its knowledge and rationality to maxmimise paperclips, but doing so might have the side effect of the clipper realising that maximising happiness is a better goal.
Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”. And what goal could produce more paperclips than the goal of producing the most paperclips possible?
(davidpearce, I’m not ignoring your response, I’m just a bit of a slow reader, and so I haven’t gotten around to reading the eighteen page paper you linked. If that’s necessary context for my discussion with whowhowho as well, then I should wait to reply to any comments in this thread until I’ve read it, but for now I’m operating under the assumption that it is not)
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All the different kinds of “better” blend into each other.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Yes, eg animal rights.
I said people or feelings, by which I’m including the feelings of any sentient animals.
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
Er, that’s what “empathically” means?
OK; thanks for your reply. Tapping out here.