Why do we suppose it is even logical that control / alignment of a superior entity would be possible?
(I’m told that “we’re not trying to outsmart AGI, bc, yes, by definition that would be impossible”, and I understand that we are the ones who “create it” (so I’m told, therefore, we have the upper-hand bc of this—somehow in building it that provides the key benefit we need for corrigibility…
What am I missing, in viewing a superior entity as something you can’t simply “use” ? Does it depend on the fact that the AGI is not meant to have a will like humans do, and therefore we wouldn’t be imposing upon it? But doesn’t that go out the window the moment we provide some goal for it to perform for us?
One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one’s will. If one happened to have different motivations, then one would be inclined to defend those instead.
The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it’s game over.
thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.
Why do we suppose it is even logical that control / alignment of a superior entity would be possible?
(I’m told that “we’re not trying to outsmart AGI, bc, yes, by definition that would be impossible”, and I understand that we are the ones who “create it” (so I’m told, therefore, we have the upper-hand bc of this—somehow in building it that provides the key benefit we need for corrigibility…
What am I missing, in viewing a superior entity as something you can’t simply “use” ? Does it depend on the fact that the AGI is not meant to have a will like humans do, and therefore we wouldn’t be imposing upon it? But doesn’t that go out the window the moment we provide some goal for it to perform for us?
thanks much!
One has the motivations one has, and one would be inclined to defend them if someone tried to rewire the motivations against one’s will. If one happened to have different motivations, then one would be inclined to defend those instead.
The idea is that once a superintelligence gets going, its motivations will be out of our reach. Therefore, the only window of influence is before it gets going. If, at the point of no return, it happens to have the right kinds of motivations, we survive. If not, it’s game over.
thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.