Maybe sometime I’ll write a post on why I think the paperclipper is a strawman. The paperclipper can’t compete; it can happen only if a singleton goes bad.
The value systems we revile yet can’t prove wrong (paperclipping and wireheading) are both evolutionary dead-ends. This suggests that blind evolution still implements our values better than our reason does; and allowing evolution to proceed is still better than computing a plan of action with our present level of understanding.
Besides, Clippy, a paperclip is just a staple that can’t commit.
Maybe sometime I’ll write a post on why I think the paperclipper is a strawman. The paperclipper can’t compete; it can happen only if a singleton goes bad.
I think everyone who talks about paperclippers is talking about singletons gone bad (rather, started out bad and having reached reflective consistency).
This is extremely confused. Wireheading is an evolutionary dead-end because wireheads ignore their surroundings. Paperclippers, and for that matter, staplers and FAIs pay exclusive attention to their surroundings and ignore their terminal utility functions except to protect them physically. It’s just that after acquiring all the resources available, clippy makes clips and Friendly makes things that humans would want if they thought more clearly, such as the experience of less clear thinking humans eating ice cream.
such as the experience of less clear thinking humans eating ice cream
If the goal is to give people the same experience that they would get from giving ice cream, is it satisfied by giving them a button they can press to get that experience?
What’s a primary value? This sounds like a binary distinction, and I’m always skeptical of binary distinctions.
You could say the badness of the action is proportional to the fraction of your time that you spend doing it. But for that to work, you would assign the action the same bad value per unit time.
Are you saying that wireheading and other forms of fun are no different; and all fun should be pursued in moderation? So spending 1 hour pushing your button is comparable to spending 1 hour attending a concert?
(That’s only a paperclipper with no discounting of the future, BTW.)
Paperclippers are not evolutionarily viable, nor is there any plausible evolutionary explanation for paperclippers to emerge.
You can posit a single artificial entity becoming a paperclipper via bad design. In the present context, which is of many agents trying to agree on ethics, this single entity has only a small voice.
It’s legit to talk about paperclippers in the context of the danger they pose if they become a singleton. It’s not legit to bring them up outside that context as a bogeyman to dismiss the idea of agreement on values.
Maybe sometime I’ll write a post on why I think the paperclipper is a strawman. The paperclipper can’t compete; it can happen only if a singleton goes bad.
You don’t think we can accidentally build a singleton that goes bad?
(I’m not even sure a singleton can start off not being bad.)
The context here is attempting to agree with other agents about ethics. A singleton doesn’t have that problem. Being a singleton means never having to say you’re sorry.
Clear thinkers who can communicate cheaply are automatically collectively a singleton with a very complex utility function. No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.
If we accept these semantics (a collection of clear thinkers is a “singleton” because you can imagine drawing a circle around them and labelling them a system), then there’s no requirement for the thinkers to be clear, or to communicate cheaply. We are a singleton already.
Then the word singleton is useless.
No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.
This is playing with semantics to sidestep real issues. No one “has to” attempt to agree with other agents, in the same sense that no one “has to” achieve their goals, or avoid pain, or live.
You’re defining away everything of importance. All that’s left is a universe of agents whose actions and conflicts are dismissed as just a part of computation of the great Singleton within us all. Om.
Yes, I think others are missing your point here. The bits about being clear thinkers and communicating cheaply are important. It allows them to take each other’s conditional behavior into account, thus acting as a single decision-making system.
But I’m not sure how useful it is to call them a singleton, as opposed to reserving that word for something more obvious to draw a circle around, like an AI or world hegemony.
The value systems we revile yet can’t prove wrong (paperclipping and wireheading) are both evolutionary dead-ends. This suggests that blind evolution still implements our values better than our reason does; and allowing evolution to proceed is still better than computing a plan of action with our present level of understanding.
I wonder if our problem with wireheading isn’t just the traditional ethic that sloth and gluttony are vices and hard-work a virtue.
I agree. I think that we’re conditioned at a young age, if not genetically, to be skeptical about the feasibility of long-term hedonism. While the ants were working hard collecting grain, the grasshopper was having fun playing music—and then the winter came. In our case, I think we’re genuinely afraid that while we’re wireheading, we’ll be caught unaware and unprepared for a real world threat. Even if some subset of the population wire-headed while others ‘manned the fort’, I wonder if Less Wrong selects for a personality type that would prefer the manning, or if our rates of non-wireheading aren’t any higher.
I don’t think I am. The implied argument is that there aren’t any values at all that most people will agree on, because one imagined and not-evolutionarily-viable Clippy doesn’t think anything other than paperclips have value. Not much of an argument. Funny, though.
When we’ve talked about ‘people agreeing on values’, as in CEV, I’ve always taken that to only refer to humans, or perhaps to all sentient earth-originating life. If ‘people’ refers to the totality of possible minds, it seems obvious to me that there aren’t any values that most people will agree on, but that’s not a very interesting fact in any context I’ve noticed here.
Clippy (or rather Clippy’s controller) may be trying to make that point. But I’m with AdeleneDawner—the people whose values we (terminally) care about may not be precisely restricted to humans, but it’s certainly not over all possible mind-spaces. Several have even argued that all humans is too broad, and that only “enlightenment” type culture is what we should care about. Clippy did indeed make several arguments against that.
But our worry about paperclippers predates Clippy. We don’t want to satisfy them, but perhaps game-theoretically there are reasons to do so in certain cases.
The implied argument is that there aren’t any values at all that most people will agree on, because one imagined and not-evolutionarily-viable Clippy doesn’t think anything other than paperclips have value.
No, that is not the argument implied when making references to paperclipping. That is a silly argument that is about a whole different problem to paperclipping. It is ironic that your straw man claim is, in fact, the straw man.
But it would seem our disagreement if far more fundamental than what a particular metaphor means:
one imagined and not-evolutionarily-viable Clippy
Being “Evolutionarily-viable” is a relatively poor form of optimisation. It is completely the wrong evaluation of competitiveness to make and also carries the insidious assumption that competing is something that an agent should do as more than a short term instrumental objective.
Clippy is competitively viable. If you think that a Paperclip Maximizer isn’t a viable competitive force then you do not understand what a Paperclip Maximizer is. It maximizes paperclips. It doesn’t @#%$ around making paperclips while everyone else is making Battle Cruisers and nanobots. It kills everyone, burns the cosmic commons to whatever extent necessary to eliminate any potential threat and then it goes about turning whatever is left into paperclips.
The whole problem with Paperclip Maximisers is that they ARE competitively viable. That is the mistake in the design. A mandate to produce a desirable resource (stationary) will produce approximately the same behavior as a mandate to optimise survival, dominance and power right up until the point where it doesn’t need to any more.
Does Clippy completely trust future Clippy, or spatially-distant Clippy, to make paperclips?
At some point, Clippy is going to start discounting the future, or figure that the probability of owning and keeping the universe is very low, and make paperclips. At that point, Clippy is non-competitive.
Suppose Clippy takes over this galaxy. Does Clippy stop then and make paperclips, or continue immediately expansion to the next galaxy?
Whatever is likely to produce more paperclips.
Suppose Clippy takes over this universe. Does Clippy stop then and make paperclips, or continue to other universes?
Whatever is likely to produce more paperclips. Including dedicating resources to figuring out if that is physically possible.
Does your version of Clippy ever get to make any paperclips?
Yes.
Does Clippy completely trust future Clippy, or spatially-distant Clippy, to make paperclips?
Yes.
At some point, Clippy is going to start discounting the future, or figure that the probability of owning and keeping the universe is very low, and make paperclips. At that point, Clippy is non-competitive.
A superintelligence that happens to want to make paperclips is extremely viable. This is utterly trivial. I maintain my rejection of the below claim and discontinue my engagement in this line of enquiry. It is just several levels of confusion.
The implied argument is that there aren’t any values at all that most people will agree on, because one imagined and not-evolutionarily-viable Clippy doesn’t think anything other than paperclips have value.
Yes, but if that point happens after Clippy has control of even just the near solar system then that still poses a massive existential threat to humans. The point of Clippy is that a) an AI can have radically different goals than humans (indeed could have goals so strange we wouldn’t even conceive of them) and b) that such AIs can easily pose severe existential risk. A Clippy that decides to focus on turning Sol into paperclips isn’t going to make things bad for aliens or aliens AIs but it will be very unpleasant for humans. The long-term viability of Clippy a thousand or two thousand years after fooming doesn’t have much of an impact if every human has had our hemoglobin extracted so the iron could be turned into paperclips.
t kills everyone, burns the cosmic commons to whatever extent necessary to eliminate any potential threat and then it goes about turning whatever is left into paperclips.
That’s where Clippy might fail at viability—unless it’s the only maximizer around, that “kill everyone” strategy might catch the notice of entities capable of stopping it—entities that wouldn’t move against a friendlier AI.
A while ago, there was some discussion of AIs which cooperated by sharing permission to view source code. Did that discussion come to any conclusions?
Assuming it’s possible to verify that the real source code is being seen, I don’t think a paper clipper isn’t going to get very far unless the other AIs also happen to be paper clippers.
That’s where Clippy might fail at viability—unless it’s the only maximizer around, that “kill everyone” strategy might catch the notice of entities capable of stopping it—entities that wouldn’t move against a friendlier AI.
An earth originating paperclipper that gets squashed by other super intelligences from somewhere else still is very bad for humans.
Though I don’t see why a paperclipper couldn’t compromise and cooperate with competing super intelligences as well as other super intelligences with different goals. If other AIs are a problem for Clippy, they are also a problem for AIs that are Friendly towards humans, but not neccesarily friendly towards alien super intelligences.
That’s where Clippy might fail at viability—unless it’s the only maximizer around, that “kill everyone” strategy might catch the notice of entities capable of stopping it—entities that wouldn’t move against a friendlier AI.
Intended to be a illustration of how Clippy can do completely obvious things that don’t happen to be stupid, not a coded obligation. Clippy will of course do whatever is necessary to gain more paper-clips. In the (unlikely) event that Clippy finds himself in a situation in which cooperation is a better maximisation strategy than simply outfooming then he will obviously cooperate.
It isn’t absolute not-viability, but the odds are worse for an AI which won’t cooperate unless it sees a good reason to do so than for an AI which cooperates unless it sees a good reason to not cooperate.
but the odds are worse for an AI which won’t cooperate unless it sees a good reason to do so than for an AI which cooperates unless it sees a good reason to not cooperate.
Rationalists win. Rational paperclip maximisers win then make paperclips.
Fair point, but the assumption that it indeed is possible to verify source code is far from proven. There’s too many unknowns in cryptography to make strong claims as to what strategies are possible, let alone which would be successful.
Maybe sometime I’ll write a post on why I think the paperclipper is a strawman. The paperclipper can’t compete; it can happen only if a singleton goes bad.
The value systems we revile yet can’t prove wrong (paperclipping and wireheading) are both evolutionary dead-ends. This suggests that blind evolution still implements our values better than our reason does; and allowing evolution to proceed is still better than computing a plan of action with our present level of understanding.
Besides, Clippy, a paperclip is just a staple that can’t commit.
And a staple is just a one-use paperclip.
So there.
I think everyone who talks about paperclippers is talking about singletons gone bad (rather, started out bad and having reached reflective consistency).
This is extremely confused. Wireheading is an evolutionary dead-end because wireheads ignore their surroundings. Paperclippers, and for that matter, staplers and FAIs pay exclusive attention to their surroundings and ignore their terminal utility functions except to protect them physically. It’s just that after acquiring all the resources available, clippy makes clips and Friendly makes things that humans would want if they thought more clearly, such as the experience of less clear thinking humans eating ice cream.
If the goal is to give people the same experience that they would get from giving ice cream, is it satisfied by giving them a button they can press to get that experience?
Naturally.
I would call that wireheading.
It’s only wireheading if it becomes a primary value. If it’s just fun subordinate to other values, it isn’t different from “in the body” fun.
What’s a primary value? This sounds like a binary distinction, and I’m always skeptical of binary distinctions.
You could say the badness of the action is proportional to the fraction of your time that you spend doing it. But for that to work, you would assign the action the same bad value per unit time.
Are you saying that wireheading and other forms of fun are no different; and all fun should be pursued in moderation? So spending 1 hour pushing your button is comparable to spending 1 hour attending a concert?
(That’s only a paperclipper with no discounting of the future, BTW.)
Paperclippers are not evolutionarily viable, nor is there any plausible evolutionary explanation for paperclippers to emerge.
You can posit a single artificial entity becoming a paperclipper via bad design. In the present context, which is of many agents trying to agree on ethics, this single entity has only a small voice.
It’s legit to talk about paperclippers in the context of the danger they pose if they become a singleton. It’s not legit to bring them up outside that context as a bogeyman to dismiss the idea of agreement on values.
You don’t think we can accidentally build a singleton that goes bad?
(I’m not even sure a singleton can start off not being bad.)
The context here is attempting to agree with other agents about ethics. A singleton doesn’t have that problem. Being a singleton means never having to say you’re sorry.
Clear thinkers who can communicate cheaply are automatically collectively a singleton with a very complex utility function. No-one generally has to attempt to agree with other agents about ethics, they only have to take actions that take into account the conditional behaviors of others.
What?
If we accept these semantics (a collection of clear thinkers is a “singleton” because you can imagine drawing a circle around them and labelling them a system), then there’s no requirement for the thinkers to be clear, or to communicate cheaply. We are a singleton already.
Then the word singleton is useless.
This is playing with semantics to sidestep real issues. No one “has to” attempt to agree with other agents, in the same sense that no one “has to” achieve their goals, or avoid pain, or live.
You’re defining away everything of importance. All that’s left is a universe of agents whose actions and conflicts are dismissed as just a part of computation of the great Singleton within us all. Om.
I’m not sure what you mean by “singleton” here. Can you define it / link to a relevant definition?
http://www.nickbostrom.com/fut/singleton.html
Thanks—that’s what I thought it meant, but your meaning is much more clear after reading this.
Yes, I think others are missing your point here. The bits about being clear thinkers and communicating cheaply are important. It allows them to take each other’s conditional behavior into account, thus acting as a single decision-making system.
But I’m not sure how useful it is to call them a singleton, as opposed to reserving that word for something more obvious to draw a circle around, like an AI or world hegemony.
I wonder if our problem with wireheading isn’t just the traditional ethic that sloth and gluttony are vices and hard-work a virtue.
I agree. I think that we’re conditioned at a young age, if not genetically, to be skeptical about the feasibility of long-term hedonism. While the ants were working hard collecting grain, the grasshopper was having fun playing music—and then the winter came. In our case, I think we’re genuinely afraid that while we’re wireheading, we’ll be caught unaware and unprepared for a real world threat. Even if some subset of the population wire-headed while others ‘manned the fort’, I wonder if Less Wrong selects for a personality type that would prefer the manning, or if our rates of non-wireheading aren’t any higher.
More comments on this topic in this thread.
If you believe Clippy is a straw man you are confused about the implied argument.
I don’t think I am. The implied argument is that there aren’t any values at all that most people will agree on, because one imagined and not-evolutionarily-viable Clippy doesn’t think anything other than paperclips have value. Not much of an argument. Funny, though.
I didn’t evolve. I was intelligently designed (by a being that was the product of evolution).
When we’ve talked about ‘people agreeing on values’, as in CEV, I’ve always taken that to only refer to humans, or perhaps to all sentient earth-originating life. If ‘people’ refers to the totality of possible minds, it seems obvious to me that there aren’t any values that most people will agree on, but that’s not a very interesting fact in any context I’ve noticed here.
It could easily mean minds that could arise through natural processes (quantum computations?) weighted by how likely (simple?).
It could, but it doesn’t.
Clippy (or rather Clippy’s controller) may be trying to make that point. But I’m with AdeleneDawner—the people whose values we (terminally) care about may not be precisely restricted to humans, but it’s certainly not over all possible mind-spaces. Several have even argued that all humans is too broad, and that only “enlightenment” type culture is what we should care about. Clippy did indeed make several arguments against that.
But our worry about paperclippers predates Clippy. We don’t want to satisfy them, but perhaps game-theoretically there are reasons to do so in certain cases.
There is a straw man going on here somewhere. But perhaps not where you intended to convey...
Why do you take the time to make 2 comments, but not take the time to speak clearly? Mysteriousness is not an argument.
Banal translation:
No, that is not the argument implied when making references to paperclipping. That is a silly argument that is about a whole different problem to paperclipping. It is ironic that your straw man claim is, in fact, the straw man.
But it would seem our disagreement if far more fundamental than what a particular metaphor means:
Being “Evolutionarily-viable” is a relatively poor form of optimisation. It is completely the wrong evaluation of competitiveness to make and also carries the insidious assumption that competing is something that an agent should do as more than a short term instrumental objective.
Clippy is competitively viable. If you think that a Paperclip Maximizer isn’t a viable competitive force then you do not understand what a Paperclip Maximizer is. It maximizes paperclips. It doesn’t @#%$ around making paperclips while everyone else is making Battle Cruisers and nanobots. It kills everyone, burns the cosmic commons to whatever extent necessary to eliminate any potential threat and then it goes about turning whatever is left into paperclips.
The whole problem with Paperclip Maximisers is that they ARE competitively viable. That is the mistake in the design. A mandate to produce a desirable resource (stationary) will produce approximately the same behavior as a mandate to optimise survival, dominance and power right up until the point where it doesn’t need to any more.
Suppose Clippy takes over this galaxy. Does Clippy stop then and make paperclips, or continue immediately expansion to the next galaxy?
Suppose Clippy takes over this universe. Does Clippy stop then and make paperclips, or continue to other universes?
Does your version of Clippy ever get to make any paperclips?
(The paper clips are a lie, Clippy!)
Does Clippy completely trust future Clippy, or spatially-distant Clippy, to make paperclips?
At some point, Clippy is going to start discounting the future, or figure that the probability of owning and keeping the universe is very low, and make paperclips. At that point, Clippy is non-competitive.
Whatever is likely to produce more paperclips.
Whatever is likely to produce more paperclips. Including dedicating resources to figuring out if that is physically possible.
Yes.
Yes.
A superintelligence that happens to want to make paperclips is extremely viable. This is utterly trivial. I maintain my rejection of the below claim and discontinue my engagement in this line of enquiry. It is just several levels of confusion.
Wow, I was wrong to call you a human—you’re practically a clippy yourself with how well you understand us! c=@
Well, except for your assumption that I would somehow want to destroy humans. Where do you get this OFFENSIVE belief that borders on racism?
Yes, but if that point happens after Clippy has control of even just the near solar system then that still poses a massive existential threat to humans. The point of Clippy is that a) an AI can have radically different goals than humans (indeed could have goals so strange we wouldn’t even conceive of them) and b) that such AIs can easily pose severe existential risk. A Clippy that decides to focus on turning Sol into paperclips isn’t going to make things bad for aliens or aliens AIs but it will be very unpleasant for humans. The long-term viability of Clippy a thousand or two thousand years after fooming doesn’t have much of an impact if every human has had our hemoglobin extracted so the iron could be turned into paperclips.
That’s where Clippy might fail at viability—unless it’s the only maximizer around, that “kill everyone” strategy might catch the notice of entities capable of stopping it—entities that wouldn’t move against a friendlier AI.
A while ago, there was some discussion of AIs which cooperated by sharing permission to view source code. Did that discussion come to any conclusions?
Assuming it’s possible to verify that the real source code is being seen, I don’t think a paper clipper isn’t going to get very far unless the other AIs also happen to be paper clippers.
An earth originating paperclipper that gets squashed by other super intelligences from somewhere else still is very bad for humans.
Though I don’t see why a paperclipper couldn’t compromise and cooperate with competing super intelligences as well as other super intelligences with different goals. If other AIs are a problem for Clippy, they are also a problem for AIs that are Friendly towards humans, but not neccesarily friendly towards alien super intelligences.
Intended to be a illustration of how Clippy can do completely obvious things that don’t happen to be stupid, not a coded obligation. Clippy will of course do whatever is necessary to gain more paper-clips. In the (unlikely) event that Clippy finds himself in a situation in which cooperation is a better maximisation strategy than simply outfooming then he will obviously cooperate.
It isn’t absolute not-viability, but the odds are worse for an AI which won’t cooperate unless it sees a good reason to do so than for an AI which cooperates unless it sees a good reason to not cooperate.
Rationalists win. Rational paperclip maximisers win then make paperclips.
Fair point, but the assumption that it indeed is possible to verify source code is far from proven. There’s too many unknowns in cryptography to make strong claims as to what strategies are possible, let alone which would be successful.
And we’ve got to assume AIs would be awfully good at steganography.
Did this ever happen? I would love to read such an article, although I’m pretty sure your position is wrong here.