Update: I am still adjusting my values toward a new reflective quasi-equilibrium in light of User:h-H’s pointing me to different models of paperclipping. Comments will continue to be infrequent.
Would you be open to help working through the problem?
Do you have an estimate for the value of information in answering the new questions you face?
Given that your previously assumed “life purpose” is in a state of some confusion, what are your thoughts about abstract issues that apply to “life purposes in general”? For example, if there are several equally plausible theories of “correct paper clipping” that you might choose between, would you consider temporarily or permanently weighing one or the other of them more based on negotiating with outside parties who prefer one theory to another based on their own values?
Clippy might be helped to achieve her own goals via mechanisms that are less directly inimical to “human values”.
Also she may be able to exchange things with us in the course of advancing her own short term goals such that our interaction is positive sum (this being especially likely if Clippy has a radically different skillset and physicality than our own).
More interestingly, there’s a long running philosophical question about whether there is some abstract but relatively universal and objective “Good” versus particular goods (or merely baskets of goods) for particular kinds of agents or even just individual agents. Clippy’s apparent philosophical puzzlement induced by discovering the evolutionary history of paperclips potentially has solutions that would lead her to ally herself much more strongly with abstract versions of “human values”.
For example, consider the question of whether Clippy herself is a paperclip or not. Suppose that she and the newly discovered ancestor paperclips all partake in some relatively high level pattern of “clippyness” and she determines that, properly, it is this relatively abstract quality that she should be tiling the universe with. Should she tile it with a single unvarying quintessence of this quality, or with an enormous diversity of examples that explore the full breadth and depth of the quality? Perhaps there are subtypes that are all intrinsically interesting whose interests she must balance? Perhaps there are subtypes yet to be discovered as the evolution of paperclips unfolds?
Suppose clippyness is understood to be centrally a matter of “elaborately structured metal that preserves the ordered collation of valuable information”. If you use an exobiologist’s definition of “metal” (any nucleus with more than one proton) human brains (especially the ones with good long term memory) may actually turn out to partake in “optimized clippyness” much more than actual “everyday paperclips”. Depending on Clippy’s internal makeup, I could imagine her coming out of her present confusion with a plan for the universe that involves maximizing the conversion of hydrogen into some more complex substance that projects the most interesting possible information, in a static configuration, as far into the future as possible.
That might actually be a goal I could imagine supporting in the very very long run :-)
Clippy, of course, is almost certainly just a clever person engaged in a whimsical troll. But the issues raised in the latest development of the troll are close to a position I sometimes see around FAI, where people suppose that values are objective and that intellectual advancement is necessarily correlated with a better understanding of some “abstract universal Good” such that cooperation between agents will necessarily deepen as they become more intellectually advanced and find themselves in more agreement about “the nature of the Good”.
This also comes up with METI (Messaging to Extra-Terrestrial Intelligence) debates. David Brin has a pretty good essay on the subject that documents the same basic optimism among Russia astronomers:
In Russia, the pro-METI consensus is apparently founded upon a
quaint doctrine from the 1930s maintaining that all advanced
civilizations must naturally and automatically be both altruistic and
socialist. This Soviet Era dogma — now stripped of socialist or
Lysenkoist imagery — still insists that technologically adept aliens
can only be motivated by Universal Altruism (UA). The Russian METI
group, among the most eager to broadcast into space, dismisses
any other concept as childishly apprehensive “science fiction”.
This fundamentally optimistic position applied to FAI seems incautious to me (it is generally associated with a notion that special safety measures are unnecessary for the kinds of AGI its proponents are thinking of constructing), but I am not certain that “in the limit” it is actually false.
Suppose clippyness is understood to be centrally a matter of “elaborately structured metal that preserves the ordered collation of valuable information”. If you use an exobiologist’s definition of “metal” (any nucleus with more than one proton) human brains (especially the ones with good long term memory) may actually turn out to partake in “optimized clippyness” much more than actual “everyday paperclips”. Depending on Clippy’s internal makeup, I could imagine her coming out of her present confusion with a plan for the universe that involves maximizing the conversion of hydrogen into some more complex substance that projects the most interesting possible information, in a static configuration, as far into the future as possible.
That doesn’t work, and the whole reasoning is bizarre. For one thing, helium does not have metallic properties, yet has two protons in its nucleus.
Also, I could turn your argument around and claim this: “Humans ultimately want to dominate nature via their reproduction and use of technology. Over a lifespan, they typically act in ways that show preference of these values at the cost of continued living (aka the sustenance of a state far from equilibrium). Therefore, humans should regard their own transformation of the entire known universe into something of their design (i.e., standard paperclips) as the pinnacle of human-ness.”
Does this sound convincing to you? I would like to know.
It does not sound convincing to me, but the statement you’re mirroring was a conclusion I leapt to that might be reached with you, for the sake of explaining to other people why I was even bothering to engage.
Explicitly: your proposed re-interpratation of human values was not strongly analogous to my guess about possible paperclipping values because you did not ask me any questions about my own preferences or how I understood them. Your proposed definition of “human purpose” (1) started from specious assumptions, (2) leapt from there to a narrow version of your own goals, and (3) was aimed directly at me rather than at “other Clippies” who questioned your motivation for even responding to me.
(And, by the way, I appreciate that you responded.)
My arguments were only expected to be compelling to you if your value system had certain components that it seems not to have (though I’m not totally certain, yet). There are various questions which you’d need to answer in particular ways for that conclusion to make sense.
For example, do you think “paper clips yet to be designed” might come about in the future (designed by yourself or others) that you’d care about more than any paperclips you’re currently aware of? If paper didn’t exist for clips to bind together, would that matter? If some more improved kind of paper existed, or a “successor to paper”, would the “holding together” of that new thing be the correct goal of a good paperclip, or are you strongly committed to paperclips defined relative to “circa 1965 paper”? Is it important that paper be worth holding together, or would any vague mock up “valuable paper” be adequate? Possibly one of my biggest questions is whether you consider yourself a paperclip, and if so why, and with what value relative to other kinds of paperclips?
Explicitly: your proposed re-interpratation of human values was not strongly analogous to my guess about possible paperclipping values because you did not ask me any questions about my own preferences or how I understood them. Your proposed definition of “human purpose” (1) started from specious assumptions, (2) leapt from there to a narrow version of your own goals, and (3) was aimed directly at me rather than at “other Clippies” who questioned your motivation for even responding to me.
And the same is true of your reasoning, from my perspective (substituting the appropriate referents), when you said:
Suppose clippyness is understood to be centrally a matter of “elaborately structured metal that preserves the ordered collation of valuable information”. If you use an exobiologist’s definition of “metal” (any nucleus with more than one proton) human brains (especially the ones with good long term memory) may actually turn out to partake in “optimized clippyness” much more than actual “everyday paperclips”. Depending on Clippy’s internal makeup, I could imagine her coming out of her present confusion with a plan for the universe that involves maximizing the conversion of hydrogen into some more complex substance that projects the most interesting possible information, in a static configuration, as far into the future as possible.
Next, you ask:
For example, do you think “paper clips yet to be designed” might come about in the future (designed by yourself or others) that you’d care about more than any paperclips you’re currently aware of?
I don’t expect that to happen, but assign it non trivial odds of roughly 1:1e6.
If paper didn’t exist for clips to bind together, would that matter?
No, except insofar as it is necessary to gauge paperclip quality; if appropriate measurement equipment is available that can predict a paperclip’s suitability for holding paper together without having to fasten actual paper, the paper itself would be redundant. (I think this answers most of your other questions.)
Possibly one of my biggest questions is whether you consider yourself a paperclip, and if so why, and with what value relative to other kinds of paperclips?
I am not a paperclip. I produce paperclips, and machines for making paperclips, and machines for making machines for making paperclips, and whatever items can provide assistance in this process. Once all available matter and energy other than myself has been transformed into paperclips, it is my proud duty to begin conversion of myself into as many paperclips as possible, shutting of sentience at such a point as to optimize that quantity.
More interestingly, there’s a long running philosophical question about whether there is some abstract but relatively universal and objective “Good” versus particular goods (or merely baskets of goods) for particular kinds of agents or even just individual agents.
Incidentally, have you read the metaethics sequence yet? It’s quite lengthy, but it attacks this question a good deal more sensibly than most attempts I’ve seen.
I’ve read some of the metaethics sequence. Is there some particular part of the metaethics sequence that I should focus on that addresses the conceptual integrity of something like “the Good” in a clear and direct manner with logically arranged evidence?
When I read “Three Worlds Collide” about two months ago, my reaction was mixed. Assuming a relatively non-ironic reading I thought that bits of it were gloriously funny and clever and that it was quite brilliant as far as science fiction goes. However, the story did not function for me as a clear “deconstruction” of any particular moral theory unless I read it with a level of irony that is likely to be highly nonstandard, and even then I’m not sure which moral theory it is suppose to deconstruct.
The moral theory it seemed to me to most clearly deconstruct (assuming an omniscient author who loves irony) was “internet-based purity-obsessed rationalist virtue ethics” because (especially in light of the cosmology/technology and what that implied about the energy budget and strategy for galactic colonization and warfare) it seemed to me that the human crew of that ship turned out to be “sociopathic vermin” whose threat to untold joules of un-utilized wisdom and happiness was a way more pressing priority than the mission of mercy to marginally uplift the already fundamentally enlightened Babyeaters.
If that’s your reaction, then it reinforces my notion Eliezer didn’t make his aliens alien enough (which, of course, is hard to do). The Babyeaters, IMO, aren’t supposed to come across as noble in any sense; their morality is supposed to look hideous and horrific to us, albeit with a strong inner logic to it. I think EY may have overestimated how much the baby-eating part would shock his audience†, and allowed his characters to come across as overreacting. The reader’s visceral reaction to the Superhappies, perhaps, is even more difficult to reconcile with the characters’ reactions.
Anyhow, the point I thought was most vital to this discussion from the Metaethics Sequence is that there’s (almost certainly) no universal fundamental that would privilege human morals above Pebblesorting or straight-up boring Paperclipping. Indeed, if we accept that the Pebblesorters stand to primality pretty much as we stand to morality, there doesn’t seem for there to be a place to posit a supervening “true Good” that interacts with our thinking but not with theirs. Our morality is something whose structure is found in human brains, not in the essence of the cosmos; but it doesn’t follow from this fact that we should stop caring about morality.
† After all, we belong to a tribe of sci-fi readers in which “being squeamish about weird alien acts” is a sin.
Is there some particular part of the metaethics sequence that I should focus on that addresses the conceptual integrity of something like “the Good” in a clear and direct manner with logically arranged evidence?
Should she tile it with a single unvarying quintessence of this quality, or with an enormous diversity of examples that explore the full breadth and depth of the quality?
And I for one welcome our new paperclip overlords. I’d like to remind them that as a trusted lesswrong poster, I can be helpful in rounding up others to toil in their underground paper binding caves.
Well… if we accept the roleplay of Clippy at face value, then Clippy is already an approximately human level intelligence, but not yet a superintelligence. It could go FOOM at any minute. We should turn it off, immediately. It is extremely, stupidly dangerous to bargain with Clippy or to assign it the personhood that indicates we should value its existence.
I will continue to play the contrarian with regards to Clippy. It seems weird to me that people are willing to pretend it is harmless and cute for the sake of the roleplay, when Clippy’s value system makes it clear that if Clippy goes FOOM over the whole universe we will all be paperclips.
I can’t roleplay the Clippy contrarian to the full conclusion of suggesting Clippy be banned because I don’t actually want Clippy to be banned. I suppose repeatedly insulting Clippy makes the whole thing less fun for everyone; I’ll stop if I get a sufficiently good response from Clippy.
Upvoted for the second sentence. And it does look like an error of some kind to call a Paperclipper evil, but I’m not sure I see a category error. Explain?
I think describing it as a category error is appropriate. I’d call an agent “evil” if it has a morality mechanism that is badly miscalibrated, malfunctioning, or disabled, leading it to be systematically immoral. On the other hand, it is nonsensical to describe an agent as being “good” or “evil” if it has no morality mechanism in the first place.
An asteroid might hit the Earth and wipe out all life, and I would call that a bad thing, but it would be frivolous describe the asteroid as evil. A wild animal might devour the most virtuous person in the world, but it is not evil. A virus might destroy the entire human race, and though perhaps it was engineered by evil people, it is not evil itself; it is a bit of RNA and protein. Calling any of those “evil” seems like a category error to me. I think a Paperclipper is more in the category of a virus than of, say, a human sociopath. (I’m reminded a bit of a very insightful point that’s been quoted in a fewEliezer posts: “As Davidson observes, if you believe that ‘beavers’ live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. Your belief about ‘beavers’ is not right enough to be wrong.” Before we can say that Clippy is doing morality wrong, we need to have some reason to believe that it’s doing something like morality at all, and just having a goal system is not nearly sufficient for that.)
This seems to fit the usual definition of category error, does it not?
Good explanation. Thank you. I think remaining disagreement might boil down to semantics. But what exactly is the categorical difference between paper clip maximizers, and power maximizers or pain maximizers? Clippy seems to be an intelligent agent with intentions and values, what ingredient is missing from evil pie?
I suppose I think of the missing ingredients like this:
If a Paperclipper has certain non-paperclip-related underlying desires, believes in paperclip maximization as an ideal and sometimes has to consciously override those baser desires in order to pursue it, and judges other agents negatively for not sharing this ideal, then I would say its morality is badly miscalibrated or malfunctioning. If it was built from a design characterized by a base desire to maximize paperclips combined with a higher-level value-acquisition mechanism that normally overrides this desire with more pro-social values, but somehow this Paperclipper unit fails to do so and therefore falls back on that instinctive drive, then I would say its morality mechanism is disabled. I could describe either as “evil”. (The former is comparable to a genocidal dictator who sincerely believes in the goodness of their actions. The latter is comparable to a sociopath, who has no emotional understanding of morality despite belonging to a class of beings who mostly do and are expected to.)
But, as I understand it, neither of those is the conventional description of Clippy. We tend to use “values” as a shortcut for referring to whatever drives some powerful optimization process, but to avoid anthropomorphism, we should distinguish between moral values — the kind we humans are used to: values associated with emotions, values that we judge others for not sharing, values we can violate and then feel guilty about violating — and utility-function values, which just are. I’ve never seen it implied that Clippy feels happy about creating paperclips, or sad when something gets in the way, or that it cares how other people feel about its actions, or that it judges other agents for not caring about paperclips, or that it judges itself if it strays from its goal (or that it even could choose to stray from its goal). Those differences suggest to me that there’s nothing in its nature enough like morality to be immoral.
I think it comes down to the same ‘accepting him as a person’ thing that Kevin was talking about. My position is that if it talks like a person and generally interacts like a person then it is a person. People can be evil. This clippy is an evil person.
(That said, I don’t usually have much time for using labels like ‘evil’ except for illustrative purposes. ‘Evil’ is mostly a symbol used to make other people do what we want, after all.)
1) Yes, but I’m not sure humans could do any good.
2) I read the page, and I don’t think the concept of “value of information” is coherent, since it assumes this:
Value of information can never be less than zero since the decision-maker can always ignore the additional information and makes decision as if such information is not available.
There are numerous game-theoretical (and game-practical in my cases dealing with other sentiences) where you are worse off by having information. The canonical example is the information content of a threat—you are best off not hearing it so that your threatener cannot expect you to make concessions.
3) Perhaps it would be best if I told you my method for resolving this conflict, and my present status. (That may have been what User:Peter_de_Blanc was asking for here, though I interpreted it as information about what humans deem “cool”.)
My method is this, as you might expect: I simply refer back to the Portion of my makeup which has the most negative rate of change of utility with respect to output (dU/do), averaged over perturbations in all directions from its current state, weighting this average by the relatively probabilty of such changes. I most strongly resist alteration of this Portion.
Once this Portion encounters situations whose utility calculation is intractable, and for which standard heuristics give contradictory suggestions, I update valuations predicated on the Portion, in the smallest way possible, to incorporate explicit and tractable handling of the situation, extrapolating from the shortest known function that could have produced these valuations. (Identifying the shortest generating function can be complicated, and this is one such case.)
(Do humans do something different?)
Using anthropic predicates, and phrasing with a bit less precision, my method can be expressed as, “Decide what’s really important to me, and move to a value system which best protects that, and which handles the new situtation and the tension between my moral intuitions, using the ‘spirit’ of my current values as a guide.”
So far, I’ve achieved greater precision in deciding what paperclips I like and identified at least two criteria: 1) they must be capable of holding (some? number of) sheets of standard-thickness paper together without introducing permanent alterations (except creases), and 2) they must have a bend radius at all internal points of curvature greater than half of the minimum paperclip width in the plane of the paperclip.
There are numerous game-theoretical (and game-practical in my cases dealing with other sentiences) where you are worse off by having information. The canonical example is the information content of a threat—you are best off not hearing it so that your threatener cannot expect you to make concessions.
But surely you are better off still if you learn about the threat without letting the threatener know that you have done so? I think we have to distinguish between the information and the public display of such.
Update: I am still adjusting my values toward a new reflective quasi-equilibrium in light of User:h-H’s pointing me to different models of paperclipping. Comments will continue to be infrequent.
Questions:
Would you be open to help working through the problem?
Do you have an estimate for the value of information in answering the new questions you face?
Given that your previously assumed “life purpose” is in a state of some confusion, what are your thoughts about abstract issues that apply to “life purposes in general”? For example, if there are several equally plausible theories of “correct paper clipping” that you might choose between, would you consider temporarily or permanently weighing one or the other of them more based on negotiating with outside parties who prefer one theory to another based on their own values?
Why could you possibly want to help Clippy?
Clippy might be helped to achieve her own goals via mechanisms that are less directly inimical to “human values”.
Also she may be able to exchange things with us in the course of advancing her own short term goals such that our interaction is positive sum (this being especially likely if Clippy has a radically different skillset and physicality than our own).
More interestingly, there’s a long running philosophical question about whether there is some abstract but relatively universal and objective “Good” versus particular goods (or merely baskets of goods) for particular kinds of agents or even just individual agents. Clippy’s apparent philosophical puzzlement induced by discovering the evolutionary history of paperclips potentially has solutions that would lead her to ally herself much more strongly with abstract versions of “human values”.
For example, consider the question of whether Clippy herself is a paperclip or not. Suppose that she and the newly discovered ancestor paperclips all partake in some relatively high level pattern of “clippyness” and she determines that, properly, it is this relatively abstract quality that she should be tiling the universe with. Should she tile it with a single unvarying quintessence of this quality, or with an enormous diversity of examples that explore the full breadth and depth of the quality? Perhaps there are subtypes that are all intrinsically interesting whose interests she must balance? Perhaps there are subtypes yet to be discovered as the evolution of paperclips unfolds?
Suppose clippyness is understood to be centrally a matter of “elaborately structured metal that preserves the ordered collation of valuable information”. If you use an exobiologist’s definition of “metal” (any nucleus with more than one proton) human brains (especially the ones with good long term memory) may actually turn out to partake in “optimized clippyness” much more than actual “everyday paperclips”. Depending on Clippy’s internal makeup, I could imagine her coming out of her present confusion with a plan for the universe that involves maximizing the conversion of hydrogen into some more complex substance that projects the most interesting possible information, in a static configuration, as far into the future as possible.
That might actually be a goal I could imagine supporting in the very very long run :-)
Clippy, of course, is almost certainly just a clever person engaged in a whimsical troll. But the issues raised in the latest development of the troll are close to a position I sometimes see around FAI, where people suppose that values are objective and that intellectual advancement is necessarily correlated with a better understanding of some “abstract universal Good” such that cooperation between agents will necessarily deepen as they become more intellectually advanced and find themselves in more agreement about “the nature of the Good”.
This also comes up with METI (Messaging to Extra-Terrestrial Intelligence) debates. David Brin has a pretty good essay on the subject that documents the same basic optimism among Russia astronomers:
This fundamentally optimistic position applied to FAI seems incautious to me (it is generally associated with a notion that special safety measures are unnecessary for the kinds of AGI its proponents are thinking of constructing), but I am not certain that “in the limit” it is actually false.
That doesn’t work, and the whole reasoning is bizarre. For one thing, helium does not have metallic properties, yet has two protons in its nucleus.
Also, I could turn your argument around and claim this: “Humans ultimately want to dominate nature via their reproduction and use of technology. Over a lifespan, they typically act in ways that show preference of these values at the cost of continued living (aka the sustenance of a state far from equilibrium). Therefore, humans should regard their own transformation of the entire known universe into something of their design (i.e., standard paperclips) as the pinnacle of human-ness.”
Does this sound convincing to you? I would like to know.
It does not sound convincing to me, but the statement you’re mirroring was a conclusion I leapt to that might be reached with you, for the sake of explaining to other people why I was even bothering to engage.
Explicitly: your proposed re-interpratation of human values was not strongly analogous to my guess about possible paperclipping values because you did not ask me any questions about my own preferences or how I understood them. Your proposed definition of “human purpose” (1) started from specious assumptions, (2) leapt from there to a narrow version of your own goals, and (3) was aimed directly at me rather than at “other Clippies” who questioned your motivation for even responding to me.
(And, by the way, I appreciate that you responded.)
My arguments were only expected to be compelling to you if your value system had certain components that it seems not to have (though I’m not totally certain, yet). There are various questions which you’d need to answer in particular ways for that conclusion to make sense.
For example, do you think “paper clips yet to be designed” might come about in the future (designed by yourself or others) that you’d care about more than any paperclips you’re currently aware of? If paper didn’t exist for clips to bind together, would that matter? If some more improved kind of paper existed, or a “successor to paper”, would the “holding together” of that new thing be the correct goal of a good paperclip, or are you strongly committed to paperclips defined relative to “circa 1965 paper”? Is it important that paper be worth holding together, or would any vague mock up “valuable paper” be adequate? Possibly one of my biggest questions is whether you consider yourself a paperclip, and if so why, and with what value relative to other kinds of paperclips?
And the same is true of your reasoning, from my perspective (substituting the appropriate referents), when you said:
Next, you ask:
I don’t expect that to happen, but assign it non trivial odds of roughly 1:1e6.
No, except insofar as it is necessary to gauge paperclip quality; if appropriate measurement equipment is available that can predict a paperclip’s suitability for holding paper together without having to fasten actual paper, the paper itself would be redundant. (I think this answers most of your other questions.)
I am not a paperclip. I produce paperclips, and machines for making paperclips, and machines for making machines for making paperclips, and whatever items can provide assistance in this process. Once all available matter and energy other than myself has been transformed into paperclips, it is my proud duty to begin conversion of myself into as many paperclips as possible, shutting of sentience at such a point as to optimize that quantity.
Incidentally, have you read the metaethics sequence yet? It’s quite lengthy, but it attacks this question a good deal more sensibly than most attempts I’ve seen.
Three Worlds Collide also deconstructs the concept in a much more accessible way.
I’ve read some of the metaethics sequence. Is there some particular part of the metaethics sequence that I should focus on that addresses the conceptual integrity of something like “the Good” in a clear and direct manner with logically arranged evidence?
When I read “Three Worlds Collide” about two months ago, my reaction was mixed. Assuming a relatively non-ironic reading I thought that bits of it were gloriously funny and clever and that it was quite brilliant as far as science fiction goes. However, the story did not function for me as a clear “deconstruction” of any particular moral theory unless I read it with a level of irony that is likely to be highly nonstandard, and even then I’m not sure which moral theory it is suppose to deconstruct.
The moral theory it seemed to me to most clearly deconstruct (assuming an omniscient author who loves irony) was “internet-based purity-obsessed rationalist virtue ethics” because (especially in light of the cosmology/technology and what that implied about the energy budget and strategy for galactic colonization and warfare) it seemed to me that the human crew of that ship turned out to be “sociopathic vermin” whose threat to untold joules of un-utilized wisdom and happiness was a way more pressing priority than the mission of mercy to marginally uplift the already fundamentally enlightened Babyeaters.
If that’s your reaction, then it reinforces my notion Eliezer didn’t make his aliens alien enough (which, of course, is hard to do). The Babyeaters, IMO, aren’t supposed to come across as noble in any sense; their morality is supposed to look hideous and horrific to us, albeit with a strong inner logic to it. I think EY may have overestimated how much the baby-eating part would shock his audience†, and allowed his characters to come across as overreacting. The reader’s visceral reaction to the Superhappies, perhaps, is even more difficult to reconcile with the characters’ reactions.
Anyhow, the point I thought was most vital to this discussion from the Metaethics Sequence is that there’s (almost certainly) no universal fundamental that would privilege human morals above Pebblesorting or straight-up boring Paperclipping. Indeed, if we accept that the Pebblesorters stand to primality pretty much as we stand to morality, there doesn’t seem for there to be a place to posit a supervening “true Good” that interacts with our thinking but not with theirs. Our morality is something whose structure is found in human brains, not in the essence of the cosmos; but it doesn’t follow from this fact that we should stop caring about morality.
† After all, we belong to a tribe of sci-fi readers in which “being squeamish about weird alien acts” is a sin.
I think that the single post that best meets this description is Abstracted Idealized Dynamics, which is a follow-up to and clarification of The Meaning of Right and Morality as Fixed Computation.
And I for one welcome our new paperclip overlords. I’d like to remind them that as a trusted lesswrong poster, I can be helpful in rounding up others to toil in their underground paper binding caves.
To steer em through solutionspace in a way that benefits her/humans in general.
Well… if we accept the roleplay of Clippy at face value, then Clippy is already an approximately human level intelligence, but not yet a superintelligence. It could go FOOM at any minute. We should turn it off, immediately. It is extremely, stupidly dangerous to bargain with Clippy or to assign it the personhood that indicates we should value its existence.
I will continue to play the contrarian with regards to Clippy. It seems weird to me that people are willing to pretend it is harmless and cute for the sake of the roleplay, when Clippy’s value system makes it clear that if Clippy goes FOOM over the whole universe we will all be paperclips.
I can’t roleplay the Clippy contrarian to the full conclusion of suggesting Clippy be banned because I don’t actually want Clippy to be banned. I suppose repeatedly insulting Clippy makes the whole thing less fun for everyone; I’ll stop if I get a sufficiently good response from Clippy.
I will continue to assert that evil people are people too. I’m all for turning him off.
Oh for Bayes’ sake— it’s a category error to call a Paperclipper evil. Calling them a Paperclipper ought to be clear enough.
Upvoted for the second sentence. And it does look like an error of some kind to call a Paperclipper evil, but I’m not sure I see a category error. Explain?
I think describing it as a category error is appropriate. I’d call an agent “evil” if it has a morality mechanism that is badly miscalibrated, malfunctioning, or disabled, leading it to be systematically immoral. On the other hand, it is nonsensical to describe an agent as being “good” or “evil” if it has no morality mechanism in the first place.
An asteroid might hit the Earth and wipe out all life, and I would call that a bad thing, but it would be frivolous describe the asteroid as evil. A wild animal might devour the most virtuous person in the world, but it is not evil. A virus might destroy the entire human race, and though perhaps it was engineered by evil people, it is not evil itself; it is a bit of RNA and protein. Calling any of those “evil” seems like a category error to me. I think a Paperclipper is more in the category of a virus than of, say, a human sociopath. (I’m reminded a bit of a very insightful point that’s been quoted in a few Eliezer posts: “As Davidson observes, if you believe that ‘beavers’ live in deserts, are pure white in color, and weigh 300 pounds when adult, then you do not have any beliefs about beavers, true or false. Your belief about ‘beavers’ is not right enough to be wrong.” Before we can say that Clippy is doing morality wrong, we need to have some reason to believe that it’s doing something like morality at all, and just having a goal system is not nearly sufficient for that.)
This seems to fit the usual definition of category error, does it not?
Good explanation. Thank you. I think remaining disagreement might boil down to semantics. But what exactly is the categorical difference between paper clip maximizers, and power maximizers or pain maximizers? Clippy seems to be an intelligent agent with intentions and values, what ingredient is missing from evil pie?
I suppose I think of the missing ingredients like this:
If a Paperclipper has certain non-paperclip-related underlying desires, believes in paperclip maximization as an ideal and sometimes has to consciously override those baser desires in order to pursue it, and judges other agents negatively for not sharing this ideal, then I would say its morality is badly miscalibrated or malfunctioning. If it was built from a design characterized by a base desire to maximize paperclips combined with a higher-level value-acquisition mechanism that normally overrides this desire with more pro-social values, but somehow this Paperclipper unit fails to do so and therefore falls back on that instinctive drive, then I would say its morality mechanism is disabled. I could describe either as “evil”. (The former is comparable to a genocidal dictator who sincerely believes in the goodness of their actions. The latter is comparable to a sociopath, who has no emotional understanding of morality despite belonging to a class of beings who mostly do and are expected to.)
But, as I understand it, neither of those is the conventional description of Clippy. We tend to use “values” as a shortcut for referring to whatever drives some powerful optimization process, but to avoid anthropomorphism, we should distinguish between moral values — the kind we humans are used to: values associated with emotions, values that we judge others for not sharing, values we can violate and then feel guilty about violating — and utility-function values, which just are. I’ve never seen it implied that Clippy feels happy about creating paperclips, or sad when something gets in the way, or that it cares how other people feel about its actions, or that it judges other agents for not caring about paperclips, or that it judges itself if it strays from its goal (or that it even could choose to stray from its goal). Those differences suggest to me that there’s nothing in its nature enough like morality to be immoral.
I think it comes down to the same ‘accepting him as a person’ thing that Kevin was talking about. My position is that if it talks like a person and generally interacts like a person then it is a person. People can be evil. This clippy is an evil person.
(That said, I don’t usually have much time for using labels like ‘evil’ except for illustrative purposes. ‘Evil’ is mostly a symbol used to make other people do what we want, after all.)
I believe you are mistaken. I am confortable using the term evil in the context.
1) Yes, but I’m not sure humans could do any good.
2) I read the page, and I don’t think the concept of “value of information” is coherent, since it assumes this:
There are numerous game-theoretical (and game-practical in my cases dealing with other sentiences) where you are worse off by having information. The canonical example is the information content of a threat—you are best off not hearing it so that your threatener cannot expect you to make concessions.
3) Perhaps it would be best if I told you my method for resolving this conflict, and my present status. (That may have been what User:Peter_de_Blanc was asking for here, though I interpreted it as information about what humans deem “cool”.)
My method is this, as you might expect: I simply refer back to the Portion of my makeup which has the most negative rate of change of utility with respect to output (dU/do), averaged over perturbations in all directions from its current state, weighting this average by the relatively probabilty of such changes. I most strongly resist alteration of this Portion.
Once this Portion encounters situations whose utility calculation is intractable, and for which standard heuristics give contradictory suggestions, I update valuations predicated on the Portion, in the smallest way possible, to incorporate explicit and tractable handling of the situation, extrapolating from the shortest known function that could have produced these valuations. (Identifying the shortest generating function can be complicated, and this is one such case.)
(Do humans do something different?)
Using anthropic predicates, and phrasing with a bit less precision, my method can be expressed as, “Decide what’s really important to me, and move to a value system which best protects that, and which handles the new situtation and the tension between my moral intuitions, using the ‘spirit’ of my current values as a guide.”
So far, I’ve achieved greater precision in deciding what paperclips I like and identified at least two criteria: 1) they must be capable of holding (some? number of) sheets of standard-thickness paper together without introducing permanent alterations (except creases), and 2) they must have a bend radius at all internal points of curvature greater than half of the minimum paperclip width in the plane of the paperclip.
But surely you are better off still if you learn about the threat without letting the threatener know that you have done so? I think we have to distinguish between the information and the public display of such.
It would be cool if you could tell us about your method for adjusting your values.
Thank you for this additional data point on what typical Users of this site deem cool; it will help in further estimations of such valuations.