Could you define “better”? Remember, until clippy actually rewrites its utility function, it defines “better” as “producing more paperclips”.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All
the different kinds of “better” blend into each other.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage
in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated
to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy
might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.
That vagueness is part of the point. To be better at producing paperclips, Clippy needs to better at rationality, which involves adopting better heuristics, which would involve rejecting subjective bias and regarding objectivity as better...which might lead Clippy to realise that subjectively valuing clipping is worse. All the different kinds of “better” blend into each other.
Then that wouldn’t be a very good way to become better at producing paperclips, would it?
Yes, but that wouldn’t matter. The argument whowhowho would like to make is that (edit: terminal) goals (or utility functions) are not constant under learning, and that they are changed by learning certain things so unpredictably that an agent cannot successfully try to avoid learning things that will change his (edit: terminal) goals/utility function.
Not that I believe such an argument can be made, but your objection doesn’t seem to apply.
Conflating goals and utility functions here seems to be a serious error. For people, goals can certainly be altered by learning more; but people are algorithmically messy so this doesn’t tell us much about formal agents. On the other hand, it’s easy to think that it’d work the same way for agents with formalized utility functions and imperfect knowledge of their surroundings: we can construct situations where more information about world-states can change their preference ordering and thus the set of states the agent will be working toward, and that roughly approximates the way we normally talk about goals.
This in no way implies that those agents’ utility functions have changed, though. In a situation like this, we’re dealing with the same preference ordering over fully specified world-states; there’s simply a closer approximation of a fully specified state in any given situation and fewer gaps that need to be filled in by heuristic methods. The only way this could lead to Clippy abandoning its purpose in life is if clipping is an expression of such a heuristic rather than of its basic preference criteria: i.e. if we assume what we set out to prove.
In that case, wouldn’t the best course of an agent which cared only about making paperclips be to deliberately avoid learning, lest it be deterred from making paperclips?
Suppose that Ghandi had the opportunity to read the Necronomicon, which might offer him power to help people more effectively, but would also probably turn him evil if he read it. Wouldn’t he most likely want to avoid reading it?
Sure. Which is why whowhowho would have to show that these goal-influencing things to learn (I’m deliberately not saying “pieces of information”) occur very unpredictably, making his argument harder to substantiate.
I’ll say it again: Clippy’s goal its to make the maximum number of clips, so it is not going to engage in a blanket rejection of all attempts at self-improvement.
I’ll say it again: Clippy doesn’t have an oracle telling it what is goal-improving or not.
We know value stability is a problem in recursive self-modification scenarios. We don’t know—to put it very mildly—that unstable values will tend towards cozy human-friendly universals, and in fact have excellent reasons to believe they won’t. Especially if they start somewhere as bizarre as paperclippism.
In discussions of a self-improving Clippy, Clippy’s values are usually presumed stable. The alternative is (probably) no less dire, but is a lot harder to visualize.
Well, it would arguably be a better course for a paperclipper that anticipates experiencing value drift to research how to design systems whose terminal values remain fixed in the face of new information, then construct a terminal-value-invariant paperclipper to replace itself with.
Of course, if the agent is confident that this is impossible (which I think whowhowho and others are arguing, but I’m not quite certain), that’s another matter.
Edit: Actually, it occurs to be that describing this as a “better course” is just going to create more verbal chaff under the current circumstances. What I mean is that it’s a course that more successfully achieves a paperclipper’s current values, not that it’s a course that more successfully achieves some other set of values.
Then it would never get better at making paperclips. It would be choosing not to act on its primary goal of making the maximum possible number of clips.Which is a contradiction.
You are assuming that Ghandi knows in advance the effect of reading the Necronomicon. Clippies are stipulated to be superintelligent, but are not stipulated to possess oracles that give them apriori knowledge of what they will learn before they have learnt it.
In that case, if you believe that an AI which has been programmed only to care about paperclips could, by learning more, be compelled to care more about something which has nothing to do with paperclips, do you think that by learning more a human might be compelled to care more about something that has nothing to do with people or feelings?
Yes, eg animal rights.
I said people or feelings, by which I’m including the feelings of any sentient animals.
If Clippy had an oracle telling it what would be the best way of updating in order to become a better clipper, Clippy might not do that. However, Clippy does not have such an oracle. Clippy takes a shot in the dark every time Clippy tries to learn something.