A paperclip maximizer will maximize paperclips. I am unable to distinguish any sense in which this is a good thing. Why should I use the word “should” to describe this, when “will” serves exactly as well?
Please amplify on that. I can sorta guess what you mean, but can’t be sure.
We make a distinction between the concepts of what people will do and what they should do. Is there an analogous pair of concepts applicable to paperclip maximizers? Why or why not? If not, what is the difference between people and paperclip maximizers that justifies there being this difference for people but not for paperclip maximizers?
A paperclip maximizer will maximize paperclips.
Will paperclip maximizers, when talking about themselves, distinguish between what they will do, and what will maximize paperclips? (While wishing they’d be more paperclip maximizers they wish they were.) What they will actually do is distinct from what will maximize paperclips: it’s predictable that actual performance is always less than optimal, given the problem is open-ended enough.
Let there be a mildly insane (after the fashion of a human) paperclipper named Clippy.
Clippy does A. Clippy would do B if a sane but bounded rationalist, C if an unbounded rationalist, and D if it had perfect veridical knowledge. That is, D is the actual paperclip-maximizing action, C is theoretically optimal given all of Clippy’s knowledge, B is as optimal as C can realistically get under perfect conditions.
Is B, C, or D what Clippy Should(Clippy) do? This is a reason to prefer “would-want”. Though I suppose a similar question applies to humans. Still, what Clippy should do is give up paperclips and become an FAI. There’s no chance of arguing Clippy into that, because Clippy doesn’t respond to what we consider a moral argument. So what’s the point of talking about what Clippy should do, since Clippy’s not going to do it? (Nor is it going to do B, C, or D, just A.)
PS: I’m also happy to talk about what it is rational for Clippy to do, referring to B.
Your usage of ‘should’ is more of a redefinition than clarification. B,C and D work as clarifications for the usual sense of the word: “should” has a feel ‘meta’ enough to transfer over to more kinds of agents.
If you can equally well talk of Should(Clippy) and Should(Humanity), then for the purposes of FAI it’s Should that needs to be understood, not one particular sense should=Should(Humanity). If one can’t explicitly write out Should(Humanity), one should probably write out Should(-), which is featureless enough for there to be no problem with the load of detailed human values, and in some sense pass Humanity as a parameter to its implementation. Do you see this framing as adequate or do you know of some problem with it?
This is a good framing for explaining the problem—you would not, in fact, try to build the same FAI for Clippies and humans, and then pass it humans as a parameter.
E.g. structural complications of human “should” that only the human FAI would have to be structurally capable of learning. (No, you cannot have complete structural freedom because then you cannot do induction.)
This is a good framing for explaining the problem—you would not, in fact, try to build the same FAI for Clippies and humans, and then pass it humans as a parameter.
I expect you would build the same FAI for paperclipping (although we don’t have any Clippies to pass it as parameter), so I’d appreciate it if you did explain the problem given you believe there is one, since it’s a direction that I’m currently working.
Humans are stuff, just like any other feature of the world, that FAI would optimize, and on stuff-level it makes no difference that people prefer to be “free to optimize”. You are “free to optimize” in a deterministic universe, it’s the way this stuff is (being) arranged that makes the difference, and it’s the content of human preference that says it shouldn’t have some features like undeserved million-dollar bags falling from the sky, where undeserved is another function of stuff. An important subtlety of preference is that it makes different features of perhaps mutually exclusive possible scenarios depend on each other, so the fact that one should care about what could be and how it’s related to what could be otherwise and even to how it’s chosen what to actually realize is about scope of what preference describes, not about specific instance of preference. That is, in a manner of speaking, it’s saying that you need an Int32, not a Bool to hold this variable, but that Int32 seems big enough.
Furthermore, considering the kind of dependence you described in that post you linked seems fundamental from a certain logical standpoint, for any system (not even “AI”). If you build the ontology for FAI on its epistemology, that is you don’t consider it as already knowing anything but only as having its program that could interact with anything, then the possible futures and its own decision-making are already there (and it’s all there is, from its point of view). All it can do, on this conceptual level, is to craft proofs (plans, designs of actions) that have the property of having certain internal dependencies in them, with the AI itself being the “current snapshot” of what it’s planning. That’s enough to handle the “free to optimize” requirement, given the right program.
Hmm, I’m essentially arguing that universal-enough FAI is “computable”, that there is a program that computes a FAI for any given “creature”, within a certain class of “creatures”. I guess this problem is void, since obviously on the too-big-class side, for a small enough class this problem is in principle solvable, and for a big enough class it’ll hit problems, if not conceptual then practical.
So the real question is about the characteristics of such class of systems for which it’s easier to build an abstract FAI, that is a tool that takes a specimen of this class as a parameter and becomes a custom-made FAI for that specimen. This class needs to at least include humanity, and given the size of humanity’s values, it needs to also include a lot of other stuff, for itself to be small enough to program explicitly. I currently expect a class of parameters of a manageable abstract FAI implementation to include even rocks and trees, since I don’t see how to rigorously define and use in FAI theory the difference between these systems and us.
This also takes care of human values/humanity’s values divide: these are just different systems to parameterize the FAI with, so there is no need for a theory of “value overlaps” distinct from a theory of “systems values”. Another question is that “humanity” will probably be a bit harder to specify as parameter than some specific human or group of people.
Re: I suppose a similar question applies to humans.
Indeed—this objection is the same for any agent, including humans.
It doesn’t seem to follow that the “should” term is inappropriate. If this is a reason for objecting to the “should” term, then the same argument concludes that it should not be used in a human context either.
Why should I use the word “should” to describe this, when “will” serves exactly as well?
‘Will’ does not serve exactly as well when considering agents with limited optimisation power (that is, any actual agent). Considering, for example, a Paperclip Maximiser that happens to be less intelligent than I am. I may be able to predict that Clippy will colonize Mars before he invades earth but also be quite sure that more paperclips would be formed if Clippy invaded Earth first. In this case I will likely want a word that means “would better serve to maximise the agent’s expected utility even if the agent does not end up doing it”.
One option is to take ‘should’ and make it the generic ‘should’. I’m not saying you should use ‘should’ (implicitly, ‘should’) to describe the action that Clippy would take if he had sufficient optimisation power. But I am saying that ‘will’ does not serve exactly as well.
I use “would-want” to indicate extrapolation. I.e., A wants X but would-want Y. This helps to indicate the implicit sensitivity to the exact extrapolation method, and that A does not actually represent a desire for Y at the current moment, etc. Similarly, A does X but would-do Y, A chooses X but would-choose Y, etc.
It’s a good thing—from their point of view. They probably think that there should be more paperclips. The term “should” makes sense in the context of a set of preferences.
No, it’s a paperclip-maximizing thing. From their point of view, and ours. No disagreement. They just care about what’s paperclip-maximizing, not what’s good.
IMO, in this context, “good” just means “favoured by this moral system”. An action that “should” be performed is just one that would be morally obligatory—according to the specified moral system. Both terms are relative to a set of moral standards.
I was talking as though a paperclip maximiser would have morals that reflected their values. You were apparently assuming the opposite. Which perspective is better would depend on which particular paperclip maximiser was being examined.
Personally, I think there are often good reasons for morals and values being in tune with one another.
A paperclip maximizer will maximize paperclips. I am unable to distinguish any sense in which this is a good thing. Why should I use the word “should” to describe this, when “will” serves exactly as well?
Please amplify on that. I can sorta guess what you mean, but can’t be sure.
We make a distinction between the concepts of what people will do and what they should do. Is there an analogous pair of concepts applicable to paperclip maximizers? Why or why not? If not, what is the difference between people and paperclip maximizers that justifies there being this difference for people but not for paperclip maximizers?
Will paperclip maximizers, when talking about themselves, distinguish between what they will do, and what will maximize paperclips? (While wishing they’d be more paperclip maximizers they wish they were.) What they will actually do is distinct from what will maximize paperclips: it’s predictable that actual performance is always less than optimal, given the problem is open-ended enough.
Let there be a mildly insane (after the fashion of a human) paperclipper named Clippy.
Clippy does A. Clippy would do B if a sane but bounded rationalist, C if an unbounded rationalist, and D if it had perfect veridical knowledge. That is, D is the actual paperclip-maximizing action, C is theoretically optimal given all of Clippy’s knowledge, B is as optimal as C can realistically get under perfect conditions.
Is B, C, or D what Clippy Should(Clippy) do? This is a reason to prefer “would-want”. Though I suppose a similar question applies to humans. Still, what Clippy should do is give up paperclips and become an FAI. There’s no chance of arguing Clippy into that, because Clippy doesn’t respond to what we consider a moral argument. So what’s the point of talking about what Clippy should do, since Clippy’s not going to do it? (Nor is it going to do B, C, or D, just A.)
PS: I’m also happy to talk about what it is rational for Clippy to do, referring to B.
Your usage of ‘should’ is more of a redefinition than clarification. B,C and D work as clarifications for the usual sense of the word: “should” has a feel ‘meta’ enough to transfer over to more kinds of agents.
If you can equally well talk of Should(Clippy) and Should(Humanity), then for the purposes of FAI it’s Should that needs to be understood, not one particular sense should=Should(Humanity). If one can’t explicitly write out Should(Humanity), one should probably write out Should(-), which is featureless enough for there to be no problem with the load of detailed human values, and in some sense pass Humanity as a parameter to its implementation. Do you see this framing as adequate or do you know of some problem with it?
This is a good framing for explaining the problem—you would not, in fact, try to build the same FAI for Clippies and humans, and then pass it humans as a parameter.
E.g. structural complications of human “should” that only the human FAI would have to be structurally capable of learning. (No, you cannot have complete structural freedom because then you cannot do induction.)
I expect you would build the same FAI for paperclipping (although we don’t have any Clippies to pass it as parameter), so I’d appreciate it if you did explain the problem given you believe there is one, since it’s a direction that I’m currently working.
Humans are stuff, just like any other feature of the world, that FAI would optimize, and on stuff-level it makes no difference that people prefer to be “free to optimize”. You are “free to optimize” in a deterministic universe, it’s the way this stuff is (being) arranged that makes the difference, and it’s the content of human preference that says it shouldn’t have some features like undeserved million-dollar bags falling from the sky, where undeserved is another function of stuff. An important subtlety of preference is that it makes different features of perhaps mutually exclusive possible scenarios depend on each other, so the fact that one should care about what could be and how it’s related to what could be otherwise and even to how it’s chosen what to actually realize is about scope of what preference describes, not about specific instance of preference. That is, in a manner of speaking, it’s saying that you need an Int32, not a Bool to hold this variable, but that Int32 seems big enough.
Furthermore, considering the kind of dependence you described in that post you linked seems fundamental from a certain logical standpoint, for any system (not even “AI”). If you build the ontology for FAI on its epistemology, that is you don’t consider it as already knowing anything but only as having its program that could interact with anything, then the possible futures and its own decision-making are already there (and it’s all there is, from its point of view). All it can do, on this conceptual level, is to craft proofs (plans, designs of actions) that have the property of having certain internal dependencies in them, with the AI itself being the “current snapshot” of what it’s planning. That’s enough to handle the “free to optimize” requirement, given the right program.
Hmm, I’m essentially arguing that universal-enough FAI is “computable”, that there is a program that computes a FAI for any given “creature”, within a certain class of “creatures”. I guess this problem is void, since obviously on the too-big-class side, for a small enough class this problem is in principle solvable, and for a big enough class it’ll hit problems, if not conceptual then practical.
So the real question is about the characteristics of such class of systems for which it’s easier to build an abstract FAI, that is a tool that takes a specimen of this class as a parameter and becomes a custom-made FAI for that specimen. This class needs to at least include humanity, and given the size of humanity’s values, it needs to also include a lot of other stuff, for itself to be small enough to program explicitly. I currently expect a class of parameters of a manageable abstract FAI implementation to include even rocks and trees, since I don’t see how to rigorously define and use in FAI theory the difference between these systems and us.
This also takes care of human values/humanity’s values divide: these are just different systems to parameterize the FAI with, so there is no need for a theory of “value overlaps” distinct from a theory of “systems values”. Another question is that “humanity” will probably be a bit harder to specify as parameter than some specific human or group of people.
Re: I suppose a similar question applies to humans.
Indeed—this objection is the same for any agent, including humans.
It doesn’t seem to follow that the “should” term is inappropriate. If this is a reason for objecting to the “should” term, then the same argument concludes that it should not be used in a human context either.
‘Will’ does not serve exactly as well when considering agents with limited optimisation power (that is, any actual agent). Considering, for example, a Paperclip Maximiser that happens to be less intelligent than I am. I may be able to predict that Clippy will colonize Mars before he invades earth but also be quite sure that more paperclips would be formed if Clippy invaded Earth first. In this case I will likely want a word that means “would better serve to maximise the agent’s expected utility even if the agent does not end up doing it”.
One option is to take ‘should’ and make it the generic ‘should’. I’m not saying you should use ‘should’ (implicitly, ‘should’) to describe the action that Clippy would take if he had sufficient optimisation power. But I am saying that ‘will’ does not serve exactly as well.
I use “would-want” to indicate extrapolation. I.e., A wants X but would-want Y. This helps to indicate the implicit sensitivity to the exact extrapolation method, and that A does not actually represent a desire for Y at the current moment, etc. Similarly, A does X but would-do Y, A chooses X but would-choose Y, etc.
“Should” is a standard word for indicating moral obligation—it seems only sensible to use it in the context of other moral systems.
It’s a good thing—from their point of view. They probably think that there should be more paperclips. The term “should” makes sense in the context of a set of preferences.
No, it’s a paperclip-maximizing thing. From their point of view, and ours. No disagreement. They just care about what’s paperclip-maximizing, not what’s good.
This is not a real point of disagreement.
IMO, in this context, “good” just means “favoured by this moral system”. An action that “should” be performed is just one that would be morally obligatory—according to the specified moral system. Both terms are relative to a set of moral standards.
I was talking as though a paperclip maximiser would have morals that reflected their values. You were apparently assuming the opposite. Which perspective is better would depend on which particular paperclip maximiser was being examined.
Personally, I think there are often good reasons for morals and values being in tune with one another.