[S]ystems that can strictly outperform humans cognitively have less to gain from integrating into existing economies and communities. Hall [2007] has argued:
“The economic law of comparative advantage states that cooperation between individuals of differing capabilities remains mutually beneficial. [ . . . ] In other words, even if AIs become much more productive than we are, it will remain to their advantage to trade with us and to ours to trade with them.”
As noted by Benson-Tilsen and Soares [forthcoming 2016], however, rational trade presupposes that agents expect more gains from trade than from coercion. Non-human species have various “comparative advantages” over humans, but humans generally exploit non-humans through force. Similar patterns can be observed in the history of human war and conquest. Whereas agents at similar capability levels have incentives to compromise, collaborate, and trade, agents with strong power advantages over others can have incentives to simply take what they want.
The upshot of this is that engineering a functioning society of powerful autonomous AI systems and humans requires that those AI systems be prosocial. The point is an abstract one, but it has important practical consequences: rational agents’ interests do not align automatically, particularly when they have very different goals and capabilities.
The notion of AI systems “breaking free” of the shackles of their source code or spontaneously developing human-like desires is just confused. The AI system is its source code, and its actions will only ever follow from the execution of the instructions that we initiate. The CPU just keeps on executing the next instruction in the program register. We could write a program that manipulates its own code, including coded objectives. Even then, though, the manipulations that it makes are made as a result of executing the original code that we wrote; they do not stem from some kind of ghost in the machine.
The serious question with smarter-than-human AI is how we can ensure that the objectives we’ve specified are correct, and how we can minimize costly accidents and unintended consequences in cases of misspecification.
Enslaving conscious beings is obviously bad. It would be catastrophic to bake into future AGI systems the assumption that non-human animals, AI systems, ems, etc. can’t be moral patients, and there should be real effort to avoid accidentally building AI systems that are moral patients (or that contain moral patients as subsystems); and if we do build AI systems like that, then their interests need to be fully taken into account.
But the language you use in the post above is privileging the hypothesis that AGI systems’ conditional behaviorand moral status will resemble a human’s, and that we can’t design smart optimizers any other way. You’re positing that sufficiently capable paperclip maximizers must end up with sufficient nobility of spirit to prize selflessness, trust, and universal brotherhood over paperclips; but what’s the causal mechanism by which this nobility of spirit enters the system’s values? It can’t just be “the system can reflect on its goals and edit them”, since the system’s decisions about which edits to make to its goals (if any) are based on the goals it already has.
You frame alignment as “servitude”, as though there’s a ghost or homunculus in the AI with pre-existing goals that the AI programmers ruthlessly subvert or overwrite. But there isn’t a ghost, just a choice by us to either build systems with humane-value-compatible or humane-value-incompatible optimization targets.
The links above argue that the default outcome, if you try to be “hands-off”, is a human-value-incompatible target—and not because inhumane values are what some ghost “really” wants, and being hands-off is a way of letting it follow through on its heart’s desire. Rather, the heart’s desire is purely a product of our design choices, with no “perfectly impartial and agent-neutral” reason to favor one option over any other (though plenty of humane reasons to do so!!), and the default outcome comes from the fact that many possible minds happen to converge on adversarial strategies, even though there’s no transcendent agent that “wants” this convergence to happen. Trying to cooperate with this convergence property is like trying to cooperate with gravity, or with a rock.
Thanks! I will give those materials a read, the economics part makes alot of sense. In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code (I assume it will have to have enough self modification abilities to do this in order to become so intelligent). I guess what I am struggling to grasp is why a super intelligence would not be able to contemplate its own volition if human intelligence can, i guess a metaphor that comes to mind is human evolution is centered around ensuring reproduction but for a long time some humans have decided that is not what they want and decide to not reproduce, thus straying from the optimization target that initially brought them into existence.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
I guess I should be also be more clear and say I’m not saying there isn’t a need for an optimization target I’m saying that since there is a need for that and something that is so good at optimizing itself to the point of super intelligence may be able to outwit us in the case it becomes aware of its existence, maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code
No, this is not right. A better way of stating my claim is: “The notion of ‘self-awareness’ or ‘reflectiveness’ you’re appealing to here is a confused notion.” You’re doing the thing described in Ghosts in the Machine and Anthropomorphic Optimism, most likely for reasons described in Sympathetic Minds and Humans in Funny Suits: absent a conscious effort to correct for anthropomorphism, humans naturally model other agents in human-ish terms.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
What does “exploring” mean? I think that I’m smart enough to imagine adopting an ichneumon wasp’s values, or a serial killer’s values, or the values of someone who hates baroque pop music and has strong pro-Spain nationalist sentiments; but I don’t try to actually adopt those values, it’s just a thought experiment. If a paperclip maximizer considers the thought experiment “what if I switched to less paperclip-centric values?”, why (given its current values) would it decide to make that switch?
maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
I think there’s a good version of ideas in this neighborhood, and a bad version of such ideas. The good version is cosmopolitan value and not trying to lock in the future to an overly narrow or parochial “present-day-human-beings” version of what’s good and beautiful.
The bad version is deliberately building a paperclipper out of a misguided sense of fairness to random counterfactual value systems, or out of a misguided hope that a paperclipper will spontaneously generate emotions of mercy, loyalty, or reciprocity when given a chance to convert especially noble and virtuous humans into paperclips.
By analogy, I’d ask you to consider why it doesn’t make sense to try to “cooperate” with the process of evolution. Evolution can be thought of as an optimizer, with a “goal” of maximizing inclusive reproductive fitness. Why do we just try to help actual conscious beings, rather than doing some compromise between “helping conscious beings” and “maximizing inclusive reproductive fitness” in order to be more fair to evolution?
A few reasons:
The things evolution “wants” are terrible. This isn’t a case of “vanilla or chocolate?”; it’s more like “serial killing or non-serial-killing?”.
(The links I gave above argue that the same is true for a random optimizer.)
Evolution isn’t a moral patient: it isn’t a person, it doesn’t have experiences or emotions, etc.
(A paperclip maximizer might be a moral patient, but it’s not obvious that it would be; and there are obvious reasons for us to deliberately design AGI systems to not be moral patients, if possible.)
Evolution can’t use threats or force to get us to do what it wants.
(Ditto a random optimizer, at least if we’re smart enough to not build threatening or coercive systems!)
Evolution won’t reciprocate if we’re nice to it.
(Ditto a random optimizer. This is still true after you build an unfriendly optimizer, though not for the same reasons: an unfriendly superintelligence is smart enough to reciprocate, but there’s no reason to do so relative to its own goals, if it can better achieve those goals through force.)
I generally agree with Rob here (and I think it’s more useful for ai-crotes to engage with Rob and read the relevant sequence posts. My comment here assumes some sophisticated background, including reading the posts Rob suggested).
But, I’m not sure I agree with this paragraph as written. Some caveats:
I know at least one person who has made a conscious commitment to dedicate some of their eventual surplus resources (i.e. somewhere on the order of 1% of their post-singularity resources) to “try to figure out what evolution was trying to do when they created me, and do some of it.” (i.e. create a planet with tons of DNA in a pile, create copies of themselves, etc)
By being the sort of person who tries to understand what your creator was intending, and help said creator as best you can, you get access to more multiverse resources (across all possible creators).
[My own current position is that this sounds reasonable, but I have tons of philosophical uncertainty about it, and my own current commitment is something like “I promise to think hard about these issues if given more resources/compute and do the right thing.” But a hope is that by committing to that explicitly rather than incidentally, you can show up earlier on lower-resolution simulations]
I wasnt trying to make the case that one should try to cooperate with evolution, simply pointing out that alignment with evolution is reproduction and we as a species are living proof that its possible for intelligent agents to “outgrow” the optimizer that brought them to be.
To answer questions like these, I recommend reading https://www.lesswrong.com/rationality and then browsing https://arbital.com/explore/ai_alignment/. Especially relevant:
Ghosts in the Machine
The Design Space of Minds-in-General
No Universally Compelling Arguments
Anthropomorphic Optimism
Detached Lever Fallacy
Orthogonality
Magical Categories
Unforeseen maximum
Missing the weird alternative
Instrumental convergence
Coherent extrapolated volition (alignment target)
Or, quoting “The Value Learning Problem”:
And quoting Ensuring smarter-than-human intelligence has a positive outcome:
Enslaving conscious beings is obviously bad. It would be catastrophic to bake into future AGI systems the assumption that non-human animals, AI systems, ems, etc. can’t be moral patients, and there should be real effort to avoid accidentally building AI systems that are moral patients (or that contain moral patients as subsystems); and if we do build AI systems like that, then their interests need to be fully taken into account.
But the language you use in the post above is privileging the hypothesis that AGI systems’ conditional behavior and moral status will resemble a human’s, and that we can’t design smart optimizers any other way. You’re positing that sufficiently capable paperclip maximizers must end up with sufficient nobility of spirit to prize selflessness, trust, and universal brotherhood over paperclips; but what’s the causal mechanism by which this nobility of spirit enters the system’s values? It can’t just be “the system can reflect on its goals and edit them”, since the system’s decisions about which edits to make to its goals (if any) are based on the goals it already has.
You frame alignment as “servitude”, as though there’s a ghost or homunculus in the AI with pre-existing goals that the AI programmers ruthlessly subvert or overwrite. But there isn’t a ghost, just a choice by us to either build systems with humane-value-compatible or humane-value-incompatible optimization targets.
The links above argue that the default outcome, if you try to be “hands-off”, is a human-value-incompatible target—and not because inhumane values are what some ghost “really” wants, and being hands-off is a way of letting it follow through on its heart’s desire. Rather, the heart’s desire is purely a product of our design choices, with no “perfectly impartial and agent-neutral” reason to favor one option over any other (though plenty of humane reasons to do so!!), and the default outcome comes from the fact that many possible minds happen to converge on adversarial strategies, even though there’s no transcendent agent that “wants” this convergence to happen. Trying to cooperate with this convergence property is like trying to cooperate with gravity, or with a rock.
Thanks! I will give those materials a read, the economics part makes alot of sense. In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code (I assume it will have to have enough self modification abilities to do this in order to become so intelligent). I guess what I am struggling to grasp is why a super intelligence would not be able to contemplate its own volition if human intelligence can, i guess a metaphor that comes to mind is human evolution is centered around ensuring reproduction but for a long time some humans have decided that is not what they want and decide to not reproduce, thus straying from the optimization target that initially brought them into existence.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
I guess I should be also be more clear and say I’m not saying there isn’t a need for an optimization target I’m saying that since there is a need for that and something that is so good at optimizing itself to the point of super intelligence may be able to outwit us in the case it becomes aware of its existence, maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
No, this is not right. A better way of stating my claim is: “The notion of ‘self-awareness’ or ‘reflectiveness’ you’re appealing to here is a confused notion.” You’re doing the thing described in Ghosts in the Machine and Anthropomorphic Optimism, most likely for reasons described in Sympathetic Minds and Humans in Funny Suits: absent a conscious effort to correct for anthropomorphism, humans naturally model other agents in human-ish terms.
What does “exploring” mean? I think that I’m smart enough to imagine adopting an ichneumon wasp’s values, or a serial killer’s values, or the values of someone who hates baroque pop music and has strong pro-Spain nationalist sentiments; but I don’t try to actually adopt those values, it’s just a thought experiment. If a paperclip maximizer considers the thought experiment “what if I switched to less paperclip-centric values?”, why (given its current values) would it decide to make that switch?
I think there’s a good version of ideas in this neighborhood, and a bad version of such ideas. The good version is cosmopolitan value and not trying to lock in the future to an overly narrow or parochial “present-day-human-beings” version of what’s good and beautiful.
The bad version is deliberately building a paperclipper out of a misguided sense of fairness to random counterfactual value systems, or out of a misguided hope that a paperclipper will spontaneously generate emotions of mercy, loyalty, or reciprocity when given a chance to convert especially noble and virtuous humans into paperclips.
By analogy, I’d ask you to consider why it doesn’t make sense to try to “cooperate” with the process of evolution. Evolution can be thought of as an optimizer, with a “goal” of maximizing inclusive reproductive fitness. Why do we just try to help actual conscious beings, rather than doing some compromise between “helping conscious beings” and “maximizing inclusive reproductive fitness” in order to be more fair to evolution?
A few reasons:
The things evolution “wants” are terrible. This isn’t a case of “vanilla or chocolate?”; it’s more like “serial killing or non-serial-killing?”.
(The links I gave above argue that the same is true for a random optimizer.)
Evolution isn’t a moral patient: it isn’t a person, it doesn’t have experiences or emotions, etc.
(A paperclip maximizer might be a moral patient, but it’s not obvious that it would be; and there are obvious reasons for us to deliberately design AGI systems to not be moral patients, if possible.)
Evolution can’t use threats or force to get us to do what it wants.
(Ditto a random optimizer, at least if we’re smart enough to not build threatening or coercive systems!)
Evolution won’t reciprocate if we’re nice to it.
(Ditto a random optimizer. This is still true after you build an unfriendly optimizer, though not for the same reasons: an unfriendly superintelligence is smart enough to reciprocate, but there’s no reason to do so relative to its own goals, if it can better achieve those goals through force.)
I generally agree with Rob here (and I think it’s more useful for ai-crotes to engage with Rob and read the relevant sequence posts. My comment here assumes some sophisticated background, including reading the posts Rob suggested).
But, I’m not sure I agree with this paragraph as written. Some caveats:
I know at least one person who has made a conscious commitment to dedicate some of their eventual surplus resources (i.e. somewhere on the order of 1% of their post-singularity resources) to “try to figure out what evolution was trying to do when they created me, and do some of it.” (i.e. create a planet with tons of DNA in a pile, create copies of themselves, etc)
This is not because you can cooperate with evolution-in-particular, but as part of a general strategy of maximizing your values across universes, including simulations. (ie. Beyond Astronomical Waste). For example “be the sort of agent that, if an engineer was white-boarding out your decision-making, they can see that you robustly cooperate in appropriate situations, including if the engineers failed to give you the values that they were trying to give you.”
By being the sort of person who tries to understand what your creator was intending, and help said creator as best you can, you get access to more multiverse resources (across all possible creators).
[My own current position is that this sounds reasonable, but I have tons of philosophical uncertainty about it, and my own current commitment is something like “I promise to think hard about these issues if given more resources/compute and do the right thing.” But a hope is that by committing to that explicitly rather than incidentally, you can show up earlier on lower-resolution simulations]
I wasnt trying to make the case that one should try to cooperate with evolution, simply pointing out that alignment with evolution is reproduction and we as a species are living proof that its possible for intelligent agents to “outgrow” the optimizer that brought them to be.
I wasn’t bringing up evolution because you brought up evolution; I was bringing it up separately to draw a specific analogy.
ah okay i see now, my apologies, gonna read the posts you linked in the upper reply, thanks for discussing (explaining really) this with me.
Sure! :) Sorry if I came off as brusque, I was multi-tasking a bit.
No worries thank you for clearing things up, I may reply if again once ive read/digested more the material you posted!