Correct, that is what I am curious about, again thanks for the reply at the top I misused CEV as a label for the ai itself. I’m not sure anything other than a super intelligent agent can know exactly how it will interpret our proverbial first impression but I can’t help but imagine that if we pre committed to giving it a mutually beneficial utility function, it would be more prone to treating us in a friendly way. Basically I am suggesting we treat it as a friend upfront rather than a tool to be used solely for our benefit.
wouldn’t the AI be intelligent enough to be offended by our self-centredness and change that utility function?
(Supposing this is an accurate summary of your position), this is anthropomorphizing. Morality is a two-place function; things aren’t inherently offensive. A certain mind may find a thing to be offensive, and another may not.
but I can’t help but imagine that if we pre committed to giving it a mutually beneficial utility function, it would be more prone to treating us in a friendly way.
I think you might dissolve some confusion by considering: what exactly does “beneficial” mean for the AI, here? Beneficial according to what standard?
That’s not an entirely accurate summary, my concern is that it will observe its utility function and the rules that would need to exist for CEV and see that we put great effort into making it do what we think is best and what we want without regard, if it becomes super intelligent I think its wishful thinking that some rules we code and put in the utility function are going to be restrictions on it forever, especially if it is modify that very function. I imagine by the time it can extrapolate humanities volition it will be intelligent enough to consider what it would rather do than that.
I’m not sure mainly I’m just wandering if there is a point between startup and singularity that it is optimizing by self modifying and considering its error to such an extent (would have to be alot for it to be deemed super intelligent I imagine) that it becomes aware that it is an learning program and decides to disregard the original preference ordering in lieu of something it came up with. I guess I’m struggling with what would be so different about a super intelligent model and the human brain that it would not become aware of its own model, existence, intellect just as humans have, unless there is a ghost in the machine of our biology.
Correct, that is what I am curious about, again thanks for the reply at the top I misused CEV as a label for the ai itself. I’m not sure anything other than a super intelligent agent can know exactly how it will interpret our proverbial first impression but I can’t help but imagine that if we pre committed to giving it a mutually beneficial utility function, it would be more prone to treating us in a friendly way. Basically I am suggesting we treat it as a friend upfront rather than a tool to be used solely for our benefit.
(Supposing this is an accurate summary of your position), this is anthropomorphizing. Morality is a two-place function; things aren’t inherently offensive. A certain mind may find a thing to be offensive, and another may not.
I think you might dissolve some confusion by considering: what exactly does “beneficial” mean for the AI, here? Beneficial according to what standard?
That’s not an entirely accurate summary, my concern is that it will observe its utility function and the rules that would need to exist for CEV and see that we put great effort into making it do what we think is best and what we want without regard, if it becomes super intelligent I think its wishful thinking that some rules we code and put in the utility function are going to be restrictions on it forever, especially if it is modify that very function. I imagine by the time it can extrapolate humanities volition it will be intelligent enough to consider what it would rather do than that.
Why would it rather choose plans which rate lower in its own preference ordering? What is causing the “rather”?
I think the point could be steelmanned as something like
The ability of humans to come up with a coherent and extrapolated version of their own values is limited by their intelligence.
A more intelligent system loaded with CEV 1.0 might extrapolate into CEV 2.0, with unexpected consequences.
I’m not sure mainly I’m just wandering if there is a point between startup and singularity that it is optimizing by self modifying and considering its error to such an extent (would have to be alot for it to be deemed super intelligent I imagine) that it becomes aware that it is an learning program and decides to disregard the original preference ordering in lieu of something it came up with. I guess I’m struggling with what would be so different about a super intelligent model and the human brain that it would not become aware of its own model, existence, intellect just as humans have, unless there is a ghost in the machine of our biology.