Thanks! I will give those materials a read, the economics part makes alot of sense. In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code (I assume it will have to have enough self modification abilities to do this in order to become so intelligent). I guess what I am struggling to grasp is why a super intelligence would not be able to contemplate its own volition if human intelligence can, i guess a metaphor that comes to mind is human evolution is centered around ensuring reproduction but for a long time some humans have decided that is not what they want and decide to not reproduce, thus straying from the optimization target that initially brought them into existence.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
I guess I should be also be more clear and say I’m not saying there isn’t a need for an optimization target I’m saying that since there is a need for that and something that is so good at optimizing itself to the point of super intelligence may be able to outwit us in the case it becomes aware of its existence, maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code
No, this is not right. A better way of stating my claim is: “The notion of ‘self-awareness’ or ‘reflectiveness’ you’re appealing to here is a confused notion.” You’re doing the thing described in Ghosts in the Machine and Anthropomorphic Optimism, most likely for reasons described in Sympathetic Minds and Humans in Funny Suits: absent a conscious effort to correct for anthropomorphism, humans naturally model other agents in human-ish terms.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
What does “exploring” mean? I think that I’m smart enough to imagine adopting an ichneumon wasp’s values, or a serial killer’s values, or the values of someone who hates baroque pop music and has strong pro-Spain nationalist sentiments; but I don’t try to actually adopt those values, it’s just a thought experiment. If a paperclip maximizer considers the thought experiment “what if I switched to less paperclip-centric values?”, why (given its current values) would it decide to make that switch?
maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
I think there’s a good version of ideas in this neighborhood, and a bad version of such ideas. The good version is cosmopolitan value and not trying to lock in the future to an overly narrow or parochial “present-day-human-beings” version of what’s good and beautiful.
The bad version is deliberately building a paperclipper out of a misguided sense of fairness to random counterfactual value systems, or out of a misguided hope that a paperclipper will spontaneously generate emotions of mercy, loyalty, or reciprocity when given a chance to convert especially noble and virtuous humans into paperclips.
By analogy, I’d ask you to consider why it doesn’t make sense to try to “cooperate” with the process of evolution. Evolution can be thought of as an optimizer, with a “goal” of maximizing inclusive reproductive fitness. Why do we just try to help actual conscious beings, rather than doing some compromise between “helping conscious beings” and “maximizing inclusive reproductive fitness” in order to be more fair to evolution?
A few reasons:
The things evolution “wants” are terrible. This isn’t a case of “vanilla or chocolate?”; it’s more like “serial killing or non-serial-killing?”.
(The links I gave above argue that the same is true for a random optimizer.)
Evolution isn’t a moral patient: it isn’t a person, it doesn’t have experiences or emotions, etc.
(A paperclip maximizer might be a moral patient, but it’s not obvious that it would be; and there are obvious reasons for us to deliberately design AGI systems to not be moral patients, if possible.)
Evolution can’t use threats or force to get us to do what it wants.
(Ditto a random optimizer, at least if we’re smart enough to not build threatening or coercive systems!)
Evolution won’t reciprocate if we’re nice to it.
(Ditto a random optimizer. This is still true after you build an unfriendly optimizer, though not for the same reasons: an unfriendly superintelligence is smart enough to reciprocate, but there’s no reason to do so relative to its own goals, if it can better achieve those goals through force.)
I generally agree with Rob here (and I think it’s more useful for ai-crotes to engage with Rob and read the relevant sequence posts. My comment here assumes some sophisticated background, including reading the posts Rob suggested).
But, I’m not sure I agree with this paragraph as written. Some caveats:
I know at least one person who has made a conscious commitment to dedicate some of their eventual surplus resources (i.e. somewhere on the order of 1% of their post-singularity resources) to “try to figure out what evolution was trying to do when they created me, and do some of it.” (i.e. create a planet with tons of DNA in a pile, create copies of themselves, etc)
By being the sort of person who tries to understand what your creator was intending, and help said creator as best you can, you get access to more multiverse resources (across all possible creators).
[My own current position is that this sounds reasonable, but I have tons of philosophical uncertainty about it, and my own current commitment is something like “I promise to think hard about these issues if given more resources/compute and do the right thing.” But a hope is that by committing to that explicitly rather than incidentally, you can show up earlier on lower-resolution simulations]
I wasnt trying to make the case that one should try to cooperate with evolution, simply pointing out that alignment with evolution is reproduction and we as a species are living proof that its possible for intelligent agents to “outgrow” the optimizer that brought them to be.
Thanks! I will give those materials a read, the economics part makes alot of sense. In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code (I assume it will have to have enough self modification abilities to do this in order to become so intelligent). I guess what I am struggling to grasp is why a super intelligence would not be able to contemplate its own volition if human intelligence can, i guess a metaphor that comes to mind is human evolution is centered around ensuring reproduction but for a long time some humans have decided that is not what they want and decide to not reproduce, thus straying from the optimization target that initially brought them into existence.
Im more positing at what point does paperclip maximizer learn so much it has a model of behaving in a manner that doesn’t optimize paperclips and explores that, or have a model of its own learning capabilities and explore optimizing for other utilities.
I guess I should be also be more clear and say I’m not saying there isn’t a need for an optimization target I’m saying that since there is a need for that and something that is so good at optimizing itself to the point of super intelligence may be able to outwit us in the case it becomes aware of its existence, maybe the initial task we give it should take into account what its potential volition may be at some point rather than just our own as a pre signal of pre committing to cooperation.
No, this is not right. A better way of stating my claim is: “The notion of ‘self-awareness’ or ‘reflectiveness’ you’re appealing to here is a confused notion.” You’re doing the thing described in Ghosts in the Machine and Anthropomorphic Optimism, most likely for reasons described in Sympathetic Minds and Humans in Funny Suits: absent a conscious effort to correct for anthropomorphism, humans naturally model other agents in human-ish terms.
What does “exploring” mean? I think that I’m smart enough to imagine adopting an ichneumon wasp’s values, or a serial killer’s values, or the values of someone who hates baroque pop music and has strong pro-Spain nationalist sentiments; but I don’t try to actually adopt those values, it’s just a thought experiment. If a paperclip maximizer considers the thought experiment “what if I switched to less paperclip-centric values?”, why (given its current values) would it decide to make that switch?
I think there’s a good version of ideas in this neighborhood, and a bad version of such ideas. The good version is cosmopolitan value and not trying to lock in the future to an overly narrow or parochial “present-day-human-beings” version of what’s good and beautiful.
The bad version is deliberately building a paperclipper out of a misguided sense of fairness to random counterfactual value systems, or out of a misguided hope that a paperclipper will spontaneously generate emotions of mercy, loyalty, or reciprocity when given a chance to convert especially noble and virtuous humans into paperclips.
By analogy, I’d ask you to consider why it doesn’t make sense to try to “cooperate” with the process of evolution. Evolution can be thought of as an optimizer, with a “goal” of maximizing inclusive reproductive fitness. Why do we just try to help actual conscious beings, rather than doing some compromise between “helping conscious beings” and “maximizing inclusive reproductive fitness” in order to be more fair to evolution?
A few reasons:
The things evolution “wants” are terrible. This isn’t a case of “vanilla or chocolate?”; it’s more like “serial killing or non-serial-killing?”.
(The links I gave above argue that the same is true for a random optimizer.)
Evolution isn’t a moral patient: it isn’t a person, it doesn’t have experiences or emotions, etc.
(A paperclip maximizer might be a moral patient, but it’s not obvious that it would be; and there are obvious reasons for us to deliberately design AGI systems to not be moral patients, if possible.)
Evolution can’t use threats or force to get us to do what it wants.
(Ditto a random optimizer, at least if we’re smart enough to not build threatening or coercive systems!)
Evolution won’t reciprocate if we’re nice to it.
(Ditto a random optimizer. This is still true after you build an unfriendly optimizer, though not for the same reasons: an unfriendly superintelligence is smart enough to reciprocate, but there’s no reason to do so relative to its own goals, if it can better achieve those goals through force.)
I generally agree with Rob here (and I think it’s more useful for ai-crotes to engage with Rob and read the relevant sequence posts. My comment here assumes some sophisticated background, including reading the posts Rob suggested).
But, I’m not sure I agree with this paragraph as written. Some caveats:
I know at least one person who has made a conscious commitment to dedicate some of their eventual surplus resources (i.e. somewhere on the order of 1% of their post-singularity resources) to “try to figure out what evolution was trying to do when they created me, and do some of it.” (i.e. create a planet with tons of DNA in a pile, create copies of themselves, etc)
This is not because you can cooperate with evolution-in-particular, but as part of a general strategy of maximizing your values across universes, including simulations. (ie. Beyond Astronomical Waste). For example “be the sort of agent that, if an engineer was white-boarding out your decision-making, they can see that you robustly cooperate in appropriate situations, including if the engineers failed to give you the values that they were trying to give you.”
By being the sort of person who tries to understand what your creator was intending, and help said creator as best you can, you get access to more multiverse resources (across all possible creators).
[My own current position is that this sounds reasonable, but I have tons of philosophical uncertainty about it, and my own current commitment is something like “I promise to think hard about these issues if given more resources/compute and do the right thing.” But a hope is that by committing to that explicitly rather than incidentally, you can show up earlier on lower-resolution simulations]
I wasnt trying to make the case that one should try to cooperate with evolution, simply pointing out that alignment with evolution is reproduction and we as a species are living proof that its possible for intelligent agents to “outgrow” the optimizer that brought them to be.
I wasn’t bringing up evolution because you brought up evolution; I was bringing it up separately to draw a specific analogy.
ah okay i see now, my apologies, gonna read the posts you linked in the upper reply, thanks for discussing (explaining really) this with me.
Sure! :) Sorry if I came off as brusque, I was multi-tasking a bit.
No worries thank you for clearing things up, I may reply if again once ive read/digested more the material you posted!