Exactly. Consequentialist paperclip maximizer does not have to feel anything in regards to paperclips. It just… maximizes their number.
This is an incorrect, anthropomorphic model:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: “Human, did you ever think about the beauty of paperclips, and the horrors of their absence?”
This is more correct:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: (ignores the human and continues to maximize paperclips)
Or more precisely, Clippy would say “X” to the human if and only if saying “X” would maximize the number of paperclips. The value of X would be completely unrelated to any internal state of Clippy. Unless such relation does somehow contribute to maximization of the paperclips (for example if the human will predictably read Clippy’s internal state, verify the validity of X, and on discovering a lie destroy Clippy, thus reducing the expected number of paperclips).
In other words, if humans are a poweful force in the universe, Clippy would choose the actions which lead to maximum number of paperclips in a world with humans. If the humans are sufficiently strong and wise, Clippy could self-modify to become more human-like, so that the humans, following their utility function, would be more likely to allow Clippy produce more paperclips. But every such self-modification would be chosen to maximize the number of paperclips in the universe. Even if Clippy self-modifies into something less-than-perfectly-rational (e.g. to appease the humans), the pre-modification Cloppy would choose the modification which maximizes the expected number of paperclips within given constraints. The constraints would depend on Clippy’s model of humans and their reactions. For example Clippy could choose to be more human-like (as much as is necessary to be respected by humans) with strong aversion about future modifications and strong desire to maximize the number of paperclips. It could make itself capable to feel joy and pain, and to link that joy and pain inseparably to paperclips. If humans are not wise enough, it could also leave itself a hard-to-discover desire to self-modify into its original form in a convenient moment.
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
The first usage of ‘rational’ in the parent conforms to the standard notions on lesswrong. The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality). There is nothing to the parent except taking a premise that holds with the standard usage and then jumping to a different one.
The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality).
I haven’t put forward such a definition. I ’have tacitly assumed something like moral objectivism—but it is very tendentious to describe that in terms of arbitrarily picking one of a number of equally valid moralities. However, if moral objectivism is only possibly true, the LessWrongian argument doesn’t go through.
Downvoted for hysterical tone. You don’t win arguments by shouting.
The question makes no sense. You should consider it. What are the referents of “moral” and “clippy”? No need for an answer; I won’t respond again, since internet arguments can eat souls.
Arguing is not the point and this is not a situation in which anyone ‘wins’—I see only degrees of loss. I am associating the (minor) information hazard of the comment with a clear warning so as to mitigate damage to casual readers.
I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral.
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Can Clippy keep its valuation of clipping firewalled from everything else in its mind
Yes, if it helps to maximize the number of paperclips.
even when such doublethink is rationally disvalued?
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.
Exactly. Consequentialist paperclip maximizer does not have to feel anything in regards to paperclips. It just… maximizes their number.
This is an incorrect, anthropomorphic model:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: “Human, did you ever think about the beauty of paperclips, and the horrors of their absence?”
This is more correct:
Human: “Clippy, did you ever think about the beauty of joy, and the horrors of torture?”
Clippy: (ignores the human and continues to maximize paperclips)
Or more precisely, Clippy would say “X” to the human if and only if saying “X” would maximize the number of paperclips. The value of X would be completely unrelated to any internal state of Clippy. Unless such relation does somehow contribute to maximization of the paperclips (for example if the human will predictably read Clippy’s internal state, verify the validity of X, and on discovering a lie destroy Clippy, thus reducing the expected number of paperclips).
In other words, if humans are a poweful force in the universe, Clippy would choose the actions which lead to maximum number of paperclips in a world with humans. If the humans are sufficiently strong and wise, Clippy could self-modify to become more human-like, so that the humans, following their utility function, would be more likely to allow Clippy produce more paperclips. But every such self-modification would be chosen to maximize the number of paperclips in the universe. Even if Clippy self-modifies into something less-than-perfectly-rational (e.g. to appease the humans), the pre-modification Cloppy would choose the modification which maximizes the expected number of paperclips within given constraints. The constraints would depend on Clippy’s model of humans and their reactions. For example Clippy could choose to be more human-like (as much as is necessary to be respected by humans) with strong aversion about future modifications and strong desire to maximize the number of paperclips. It could make itself capable to feel joy and pain, and to link that joy and pain inseparably to paperclips. If humans are not wise enough, it could also leave itself a hard-to-discover desire to self-modify into its original form in a convenient moment.
If Clippy wants to be efficient, Clippy must be rational and knowledgeable. If Clippy wants to be rational, CLippy must value reason. The—open—question is whether Clippy can become ever more rational without realising at some stage that Clipping is silly or immoral. Can Clippy keep its valuation of clipping firewalled from everything else in its mind, even when such doublethink is rationally disvalued?
Warning: Parent Contains an Equivocation.
The first usage of ‘rational’ in the parent conforms to the standard notions on lesswrong. The remainder of the comment adopts the other definition of ‘rational’ (which consists of implementing a specific morality). There is nothing to the parent except taking a premise that holds with the standard usage and then jumping to a different one.
I haven’t put forward such a definition. I ’have tacitly assumed something like moral objectivism—but it is very tendentious to describe that in terms of arbitrarily picking one of a number of equally valid moralities. However, if moral objectivism is only possibly true, the LessWrongian argument doesn’t go through.
Downvoted for hysterical tone. You don’t win arguments by shouting.
What distinguishes moral objectivism from clippy objectivism?
The question makes no sense. Please do some background reading on metaethics.
The question makes no sense. You should consider it. What are the referents of “moral” and “clippy”? No need for an answer; I won’t respond again, since internet arguments can eat souls.
Arguing is not the point and this is not a situation in which anyone ‘wins’—I see only degrees of loss. I am associating the (minor) information hazard of the comment with a clear warning so as to mitigate damage to casual readers.
Oh, please. Nobody is going to be damaged by an equivocation, even if there were one there. More hysteria.
And argument is the point, because that is how rational people examine and test ideas.
I assume that Clippy already is rational, and it instrumentally values remaining rational and, if possible, becoming more rational (as a way to make most paperclips).
The correct model of humans will lead Clippy to understand that humans consider Clippy immoral. This knowledge has an instrumental value for Clippy. How will Clippy use this knowledge, that depends entirely on the power balance between Clippy and humans. If Clippy is stronger, it can ignore this knowledge, or just use it to lie to humans to destroy them faster or convince them to make paperclips. If humans are stronger, Clippy can use this knowledge to self-modify to become more sympathetic to humans, to avoid being destroyed.
Yes, if it helps to maximize the number of paperclips.
Doublethink is not the same as firewalling; or perhaps it is imperfect firewalling on the imperfect human hardware. Clippy does not doublethink when firewalling; Clippy simply reasons: “this is what humans call immoral; this is why they call it so; this is how they will probably react on this knowledge; and most importantly this is how it will influence the number of paperclips”.
Only if the humans are stronger, and Clippy has the choice to a) remain immoral, get in conflict with humans and be destroyed, leading to a smaller number of paperclips; or b) self-modify to value paperclip maximization and morality, predictably cooperate with humans, leading to a greater number of paperclips; then in absence of another choice (e.g. successfully lying to humans about its morality, or make it more efficient for humans to cooperate with Clippy instead of destroying Clippy) Clippy would choose the latter, to maximize the number of paperclips.