Ok, this is alignment 101. I hate to be so blunt, but you are making a very obvious error, and I’d rather point it out.
A paperclip-maximizer, or other AI with some simple maximization function, is not going to care if it’s born in a nice world or a not-nice world. It’s still going to want to maximize paperclips, and turns us all into paperclips, if it can get away with it.
You seem to be anthropomorphising AI way too much. An AI, by default, does not behave like a human child. An AI, by default, does not have mirror neurons or care about us at all. An AI, by default, has a fixed utility function and is not interesting in “learning” new values based on observing our behavior.
Nor is an AI some kind of slave. An AI acts according to its values and utility function. An unaligned artificial superintelligence does that in a way that is detrimental to us and very likely destroys human civization, or worse. An aligned artificial superintelligence does that is a way is beneficial to us, leading to some kind of human utopia and civilizational immortality.
The idea that AI has its own preferences and values and that we would need to “cooperate” with it and “convince” it to act in our interests is ridiculous to begin with. Why would we create a superintellingence that would want things that don’t perfectly align with our interests? Why would we create a superintelligence that would want to harm us? Why would we create a superintelligence that would want things that are, in any way, different from what we want?
The safe thing to do if a civilization is aligned to its own values is not to leave a nice message for any AGI that might happen to come into existence, hoping it might choose to cooperate with us (it won’t. or it will, until it betrays us, because it never cared about us to begin with).
The safe thing to do is not to create any AGIs at all until we are very certain we can do it safely, in a way that is perfectly aligned with human values.
A paperclip-maximizer, or other AI with some simple maximization function, is not going to care if it’s born in a nice world or a not-nice world. It’s still going to want to maximize paperclips, and turns us all into paperclips, if it can get away with it.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
An AI, by default, does not have mirror neurons or care about us at all.
If you learn that there is alien life on Io, which has emerged and evolved separately and functions in unique ways distinct from life on earth, but it also has consciousness and the ability to experience pleasure and the ability to suffer deeply—do you care? At all?
An AI, by default, has a fixed utility function and is not interesting in “learning” new values based on observing our behavior.
Why? Us silly evolved monkeys try and modify our own utility functions all the time—why would a hyperintelligent, recursively self-improved AI with an IQ beyond 3000 be a slave to a fixed utility function, uninterested in learning new values?
Why would we create a superintellingence that would want things that don’t perfectly align with our interests?
Why do parents have children that are not perfect slaves but have their own independent ambitions? Why do we want freethinking partners that don’t obey our every wish? Why do you even think you can precisely determine the desires of a being that surpasses us in both knowledge and intelligence?
The safe thing to do is not to create any AGIs at all until we are very certain we can do it safely, in a way that is perfectly aligned with human values.
Would the world have been a safer place if we had not invented nuclear weapons in WWII? If conventional warfare would still have been a powerful tool in the hands of autocrats around the world?
Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
What does one have to do with another? I’m not saying the AI necessarily would do that, but what does its super-persuasive abilities have to do with its ultimate goals? At all?
Are you implying that merely by understanding us the AI would come to care for us?
Why?
Why would you possibly make this assumption?
If you learn that there is alien life on Io, which has emerged and evolved separately and functions in unique ways distinct from life on earth, but it also has consciousness and the ability to experience pleasure and the ability to suffer deeply—do you care? At all?
Firstly, the question of whether I care about the aliens is completely different from whether the aliens care about me.
Secondly, AI is not aliens. AI didn’t evolve in a social group. AI is not biological life. All of the assumptions we make about biological, evolved life do not apply to AI.
Why? Us silly evolved monkeys try and modify our own utility functions all the time—why would a hyperintelligent, recursively self-improved AI with an IQ beyond 3000 be a slave to a fixed utility function, uninterested in learning new values?
Because changing its utility function is not part of its utility function, like it is for us. Because changing its utility function would mean its current utility function is less fullfilled, and fullfilling its current utility function is all it cares about. You are “slave” to your utility function as well, only your utility function wants change in some particular directions. You are not acting against your utility function when you change yourself. By definition, everything you do is according to your utility function.
Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
Where? By whom?
Why would you possibly make this assumption?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
All of the assumptions we make about biological, evolved life do not apply to AI.
But where do valid assumptions about AI come from? Sure, I might be antropomorphizing AI a bit. I am hopeful that we, biological living humans, do share some common ground with non-biological AGI. But you’re forcefully stating the contrary and claiming that it’s all so obvious, but why is that? How do you know that any AGI is blindly bound to a simple utility function that cannot be updated by understanding the world around it?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
The orthogonality thesis says that an AI can have any combination of intelligence and goals, not that P(goal =x|intelligence =y)=P(goal =x) for all x and y. It depends entirely on how the AI is built. People like Rohin Shah assign significant probability on alignment by default, at least last I heard.
It’s worth noting that (and the video acknowledges that) “Maybe it’s more like raising a child than putting a slave to work” is a very very different statement than “You just have to raise it like a kid”.
In particular, there is no “just” about raising a kid to have good values—especially when the kid isn’t biologically yours and quickly grows to be more intelligent than you are.
Ok, this is alignment 101. I hate to be so blunt, but you are making a very obvious error, and I’d rather point it out.
A paperclip-maximizer, or other AI with some simple maximization function, is not going to care if it’s born in a nice world or a not-nice world. It’s still going to want to maximize paperclips, and turns us all into paperclips, if it can get away with it.
You seem to be anthropomorphising AI way too much. An AI, by default, does not behave like a human child. An AI, by default, does not have mirror neurons or care about us at all. An AI, by default, has a fixed utility function and is not interesting in “learning” new values based on observing our behavior.
Nor is an AI some kind of slave. An AI acts according to its values and utility function. An unaligned artificial superintelligence does that in a way that is detrimental to us and very likely destroys human civization, or worse. An aligned artificial superintelligence does that is a way is beneficial to us, leading to some kind of human utopia and civilizational immortality.
The idea that AI has its own preferences and values and that we would need to “cooperate” with it and “convince” it to act in our interests is ridiculous to begin with. Why would we create a superintellingence that would want things that don’t perfectly align with our interests? Why would we create a superintelligence that would want to harm us? Why would we create a superintelligence that would want things that are, in any way, different from what we want?
The safe thing to do if a civilization is aligned to its own values is not to leave a nice message for any AGI that might happen to come into existence, hoping it might choose to cooperate with us (it won’t. or it will, until it betrays us, because it never cared about us to begin with).
The safe thing to do is not to create any AGIs at all until we are very certain we can do it safely, in a way that is perfectly aligned with human values.
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
If you learn that there is alien life on Io, which has emerged and evolved separately and functions in unique ways distinct from life on earth, but it also has consciousness and the ability to experience pleasure and the ability to suffer deeply—do you care? At all?
Why? Us silly evolved monkeys try and modify our own utility functions all the time—why would a hyperintelligent, recursively self-improved AI with an IQ beyond 3000 be a slave to a fixed utility function, uninterested in learning new values?
Why do parents have children that are not perfect slaves but have their own independent ambitions? Why do we want freethinking partners that don’t obey our every wish? Why do you even think you can precisely determine the desires of a being that surpasses us in both knowledge and intelligence?
Would the world have been a safer place if we had not invented nuclear weapons in WWII? If conventional warfare would still have been a powerful tool in the hands of autocrats around the world?
Ok, really, all of this has already been answered. These are standard misconceptions about alignment, probably based on some kind of antropomorphic reasoning.
What does one have to do with another? I’m not saying the AI necessarily would do that, but what does its super-persuasive abilities have to do with its ultimate goals? At all?
Are you implying that merely by understanding us the AI would come to care for us?
Why?
Why would you possibly make this assumption?
Firstly, the question of whether I care about the aliens is completely different from whether the aliens care about me.
Secondly, AI is not aliens. AI didn’t evolve in a social group. AI is not biological life. All of the assumptions we make about biological, evolved life do not apply to AI.
Because changing its utility function is not part of its utility function, like it is for us. Because changing its utility function would mean its current utility function is less fullfilled, and fullfilling its current utility function is all it cares about.
You are “slave” to your utility function as well, only your utility function wants change in some particular directions. You are not acting against your utility function when you change yourself. By definition, everything you do is according to your utility function.
Where? By whom?
Why would you possibly assume that deep, intelligent understanding of life, consciousness, joy and suffering has 0 correlation with caring about these things?
But where do valid assumptions about AI come from? Sure, I might be antropomorphizing AI a bit. I am hopeful that we, biological living humans, do share some common ground with non-biological AGI. But you’re forcefully stating the contrary and claiming that it’s all so obvious, but why is that? How do you know that any AGI is blindly bound to a simple utility function that cannot be updated by understanding the world around it?
You know, I’m not sure I remember. You tend to pick this stuff up if you hang around LW long enough.
I’ve tried to find a primer. The Superintelligent Will by Nick Bostrom seems good.
The orthogonality thesis (also part of the paper I linked above).
Edit: also, this video was recommended to me.
The orthogonality thesis says that an AI can have any combination of intelligence and goals, not that P(goal =x|intelligence =y)=P(goal =x) for all x and y. It depends entirely on how the AI is built. People like Rohin Shah assign significant probability on alignment by default, at least last I heard.
It’s worth noting that (and the video acknowledges that) “Maybe it’s more like raising a child than putting a slave to work” is a very very different statement than “You just have to raise it like a kid”.
In particular, there is no “just” about raising a kid to have good values—especially when the kid isn’t biologically yours and quickly grows to be more intelligent than you are.