Upvoted for the second sentence. And it does look like an error of some kind to call a Paperclipper evil, but I’m not sure I see a category error. Explain?
I think describing it as a category error is appropriate. I’d call an agent “evil” if it has a morality mechanism that is badly miscalibrated, malfunctioning, or disabled, leading it to be systematically immoral. On the other hand, it is nonsensical to describe an agent as being “good” or “evil” if it has no morality mechanism in the first place.
An asteroid might hit the Earth and wipe out all life, and I would call that a bad thing, but it would be frivolous describe the asteroid as evil. A wild animal might devour the most virtuous person in the world, but it is not evil. A virus might destroy the entire human race, and though perhaps it was engineered by evil people, it is not evil itself; it is a bit of RNA and protein. Calling any of those “evil” seems like a category error to me. I think a Paperclipper is more in the category of a virus than of, say, a human sociopath. (I’m reminded a bit of a very insightful point that’s been quoted in a fewEliezer posts: “As Davidson observes, if you believe that ‘beavers’ live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. Your belief about ‘beavers’ is not right enough to be wrong.” Before we can say that Clippy is doing morality wrong, we need to have some reason to believe that it’s doing something like morality at all, and just having a goal system is not nearly sufficient for that.)
This seems to fit the usual definition of category error, does it not?
Good explanation. Thank you. I think remaining disagreement might boil down to semantics. But what exactly is the categorical difference between paper clip maximizers, and power maximizers or pain maximizers? Clippy seems to be an intelligent agent with intentions and values, what ingredient is missing from evil pie?
I suppose I think of the missing ingredients like this:
If a Paperclipper has certain non-paperclip-related underlying desires, believes in paperclip maximization as an ideal and sometimes has to consciously override those baser desires in order to pursue it, and judges other agents negatively for not sharing this ideal, then I would say its morality is badly miscalibrated or malfunctioning. If it was built from a design characterized by a base desire to maximize paperclips combined with a higher-level value-acquisition mechanism that normally overrides this desire with more pro-social values, but somehow this Paperclipper unit fails to do so and therefore falls back on that instinctive drive, then I would say its morality mechanism is disabled. I could describe either as “evil”. (The former is comparable to a genocidal dictator who sincerely believes in the goodness of their actions. The latter is comparable to a sociopath, who has no emotional understanding of morality despite belonging to a class of beings who mostly do and are expected to.)
But, as I understand it, neither of those is the conventional description of Clippy. We tend to use “values” as a shortcut for referring to whatever drives some powerful optimization process, but to avoid anthropomorphism, we should distinguish between moral values — the kind we humans are used to: values associated with emotions, values that we judge others for not sharing, values we can violate and then feel guilty about violating — and utility-function values, which just are. I’ve never seen it implied that Clippy feels happy about creating paperclips, or sad when something gets in the way, or that it cares how other people feel about its actions, or that it judges other agents for not caring about paperclips, or that it judges itself if it strays from its goal (or that it even could choose to stray from its goal). Those differences suggest to me that there’s nothing in its nature enough like morality to be immoral.
I think it comes down to the same ‘accepting him as a person’ thing that Kevin was talking about. My position is that if it talks like a person and generally interacts like a person then it is a person. People can be evil. This clippy is an evil person.
(That said, I don’t usually have much time for using labels like ‘evil’ except for illustrative purposes. ‘Evil’ is mostly a symbol used to make other people do what we want, after all.)
Upvoted for the second sentence. And it does look like an error of some kind to call a Paperclipper evil, but I’m not sure I see a category error. Explain?
I think describing it as a category error is appropriate. I’d call an agent “evil” if it has a morality mechanism that is badly miscalibrated, malfunctioning, or disabled, leading it to be systematically immoral. On the other hand, it is nonsensical to describe an agent as being “good” or “evil” if it has no morality mechanism in the first place.
An asteroid might hit the Earth and wipe out all life, and I would call that a bad thing, but it would be frivolous describe the asteroid as evil. A wild animal might devour the most virtuous person in the world, but it is not evil. A virus might destroy the entire human race, and though perhaps it was engineered by evil people, it is not evil itself; it is a bit of RNA and protein. Calling any of those “evil” seems like a category error to me. I think a Paperclipper is more in the category of a virus than of, say, a human sociopath. (I’m reminded a bit of a very insightful point that’s been quoted in a few Eliezer posts: “As Davidson observes, if you believe that ‘beavers’ live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. Your belief about ‘beavers’ is not right enough to be wrong.” Before we can say that Clippy is doing morality wrong, we need to have some reason to believe that it’s doing something like morality at all, and just having a goal system is not nearly sufficient for that.)
This seems to fit the usual definition of category error, does it not?
Good explanation. Thank you. I think remaining disagreement might boil down to semantics. But what exactly is the categorical difference between paper clip maximizers, and power maximizers or pain maximizers? Clippy seems to be an intelligent agent with intentions and values, what ingredient is missing from evil pie?
I suppose I think of the missing ingredients like this:
If a Paperclipper has certain non-paperclip-related underlying desires, believes in paperclip maximization as an ideal and sometimes has to consciously override those baser desires in order to pursue it, and judges other agents negatively for not sharing this ideal, then I would say its morality is badly miscalibrated or malfunctioning. If it was built from a design characterized by a base desire to maximize paperclips combined with a higher-level value-acquisition mechanism that normally overrides this desire with more pro-social values, but somehow this Paperclipper unit fails to do so and therefore falls back on that instinctive drive, then I would say its morality mechanism is disabled. I could describe either as “evil”. (The former is comparable to a genocidal dictator who sincerely believes in the goodness of their actions. The latter is comparable to a sociopath, who has no emotional understanding of morality despite belonging to a class of beings who mostly do and are expected to.)
But, as I understand it, neither of those is the conventional description of Clippy. We tend to use “values” as a shortcut for referring to whatever drives some powerful optimization process, but to avoid anthropomorphism, we should distinguish between moral values — the kind we humans are used to: values associated with emotions, values that we judge others for not sharing, values we can violate and then feel guilty about violating — and utility-function values, which just are. I’ve never seen it implied that Clippy feels happy about creating paperclips, or sad when something gets in the way, or that it cares how other people feel about its actions, or that it judges other agents for not caring about paperclips, or that it judges itself if it strays from its goal (or that it even could choose to stray from its goal). Those differences suggest to me that there’s nothing in its nature enough like morality to be immoral.
I think it comes down to the same ‘accepting him as a person’ thing that Kevin was talking about. My position is that if it talks like a person and generally interacts like a person then it is a person. People can be evil. This clippy is an evil person.
(That said, I don’t usually have much time for using labels like ‘evil’ except for illustrative purposes. ‘Evil’ is mostly a symbol used to make other people do what we want, after all.)