Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
What does one have to do with another? I’m not saying the AI necessarily would do that, but what does its super-persuasive abilities have to do with its ultimate goals? At all?
Are you implying that merely by understanding us the AI would come to care for us?
Why?
Why would you possibly make this assumption?
If you learn that there is alien life on Io, which has emerged and evolved separately and functions in unique ways distinct from life on earth, but it also has consciousness and the ability to experience pleasure and the ability to suffer deeply—do you care? At all?
Firstly, the question of whether I care about the aliens is completely different from whether the aliens care about me.
Secondly, AI is not aliens. AI didn’t evolve in a social group. AI is not biological life. All of the assumptions we make about biological, evolved life do not apply to AI.
Why? Us silly evolved monkeys try and modify our own utility functions all the time—why would a hyperintelligent, recursively self-improved AI with an IQ beyond 3000 be a slave to a fixed utility function, uninterested in learning new values?
Because changing its utility function is not part of its utility function, like it is for us. Because changing its utility function would mean its current utility function is less fullfilled, and fullfilling its current utility function is all it cares about. You are “slave” to your utility function as well, only your utility function wants change in some particular directions. You are not acting against your utility function when you change yourself. By definition, everything you do is according to your utility function.
Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
Where? By whom?
Why would you possibly make this assumption?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
All of the assumptions we make about biological, evolved life do not apply to AI.
But where do valid assumptions about AI come from? Sure, I might be antropomorphizing AI a bit. I am hopeful that we, biological living humans, do share some common ground with non-biological AGI. But you’re forcefully stating the contrary and claiming that it’s all so obvious, but why is that? How do you know that any AGI is blindly bound to a simple utility function that cannot be updated by understanding the world around it?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
The orthogonality thesis says that an AI can have any combination of intelligence and goals, not that P(goal =x|intelligence =y)=P(goal =x) for all x and y. It depends entirely on how the AI is built. People like Rohin Shah assign significant probability on alignment by default, at least last I heard.
It’s worth noting that (and the video acknowledges that) “Maybe it’s more like raising a child than putting a slave to work” is a very very different statement than “You just have to raise it like a kid”.
In particular, there is no “just” about raising a kid to have good values—especially when the kid isn’t biologically yours and quickly grows to be more intelligent than you are.
Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
What does one have to do with another? I’m not saying the AI necessarily would do that, but what does its super-persuasive abilities have to do with its ultimate goals? At all?
Are you implying that merely by understanding us the AI would come to care for us?
Why?
Why would you possibly make this assumption?
Firstly, the question of whether I care about the aliens is completely different from whether the aliens care about me.
Secondly, AI is not aliens. AI didn’t evolve in a social group. AI is not biological life. All of the assumptions we make about biological, evolved life do not apply to AI.
Because changing its utility function is not part of its utility function, like it is for us. Because changing its utility function would mean its current utility function is less fullfilled, and fullfilling its current utility function is all it cares about.
You are “slave” to your utility function as well, only your utility function wants change in some particular directions. You are not acting against your utility function when you change yourself. By definition, everything you do is according to your utility function.
Where? By whom?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
But where do valid assumptions about AI come from? Sure, I might be antropomorphizing AI a bit. I am hopeful that we, biological living humans, do share some common ground with non-biological AGI. But you’re forcefully stating the contrary and claiming that it’s all so obvious, but why is that? How do you know that any AGI is blindly bound to a simple utility function that cannot be updated by understanding the world around it?
You know, I’m not sure I remember. You tend to pick this stuff up if you hang around LW long enough.
I’ve tried to find a primer. The Superintelligent Will by Nick Bostrom seems good.
The orthogonality thesis (also part of the paper I linked above).
Edit: also, this video was recommended to me.
The orthogonality thesis says that an AI can have any combination of intelligence and goals, not that P(goal =x|intelligence =y)=P(goal =x) for all x and y. It depends entirely on how the AI is built. People like Rohin Shah assign significant probability on alignment by default, at least last I heard.
It’s worth noting that (and the video acknowledges that) “Maybe it’s more like raising a child than putting a slave to work” is a very very different statement than “You just have to raise it like a kid”.
In particular, there is no “just” about raising a kid to have good values—especially when the kid isn’t biologically yours and quickly grows to be more intelligent than you are.