This is not what I meant by “the same values”, but the comment points towards an interesting point.
When I say “the same values”, I mean the same utility function, as a function over the state of the world (and the states of “R is having sex” and “H is having sex” are different).
The interesting point is that states need to be inferred from observations, and it seems like there are some fundamentally hard issues around doing that in a satisfying way.
Related to the distinction between 2-place and 1-place words. We want the AI to have the “curried” version of human values, not a symmetric version where the word “me” now refers to the AI itself.
In your definition this distinction about the state of the world is not obvious, as usually humans use the words “have the same values” not about the state of the world, but meaning having the same set of preferences but centered around a different person.
In that case, situation becomes creepy: for example, I want to have sex with human females. A holistically aligned with me AI will also want to have sex with human females—but I am not happy about it, and the females will also find it creepy. More over, if such AI will have 1000 times of my capabilities, it will completely dominate sexual market, getting almost all sex in the world and humans will go extinct.
An AI with avturchin-like values centered around the AI would completely dominate the sexual market, if and only if, avturchin would completely dominate the sexual market given the opportunity. More generally, having an AI with human-like values centered around itself is only as bad as having as having an AI holistically aligned (in the original, absolute, sense) with a human who is not you.
As an aside, why would I find it creepy if a godlike superintelligence wants to have sex with me? It’s kinda hot actually :)
This is not what I meant by “the same values”, but the comment points towards an interesting point.
When I say “the same values”, I mean the same utility function, as a function over the state of the world (and the states of “R is having sex” and “H is having sex” are different).
The interesting point is that states need to be inferred from observations, and it seems like there are some fundamentally hard issues around doing that in a satisfying way.
Related to the distinction between 2-place and 1-place words. We want the AI to have the “curried” version of human values, not a symmetric version where the word “me” now refers to the AI itself.
Can you please explain the distinction more succinctly, and say how it is related?
In that case, “paradise for R” and “paradise for H” are different. You need to check out “centered worlds”.
In your definition this distinction about the state of the world is not obvious, as usually humans use the words “have the same values” not about the state of the world, but meaning having the same set of preferences but centered around a different person.
In that case, situation becomes creepy: for example, I want to have sex with human females. A holistically aligned with me AI will also want to have sex with human females—but I am not happy about it, and the females will also find it creepy. More over, if such AI will have 1000 times of my capabilities, it will completely dominate sexual market, getting almost all sex in the world and humans will go extinct.
An AI with avturchin-like values centered around the AI would completely dominate the sexual market, if and only if, avturchin would completely dominate the sexual market given the opportunity. More generally, having an AI with human-like values centered around itself is only as bad as having as having an AI holistically aligned (in the original, absolute, sense) with a human who is not you.
As an aside, why would I find it creepy if a godlike superintelligence wants to have sex with me? It’s kinda hot actually :)
I could imagine the following conversation with holistically aligned AI:
-Mom, I decided to become homosexual.
-No, you will not do it, because heterosexuality was your terminal value at the moment of my creation.