thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.
thank you. Make some sense...but does “rewriting its own code” (the very code we thought would perhaps permanently influence it before it got-going) nullify our efforts at hardcoding our intentions?
I’m not a psychopath, and if I got the opportunity to rewrite my own source code to become a psychopath, I wouldn’t do it.
At the same time, it’s the evolutionary and cultural programming in my source code that contains the desire not to become a psychopath.
In other words, once the desire to not become a psychopath is there in my source code, I will do my best not to become one, even if I have the ability to modify my source code.
That makes sense. My intention was not to argue from the position of it becoming a psychopath though (my apologies if it came out that way)...but instead from a perspective of an entity which starts-out as supposedly Aligned (centered-on human safety, let’s say), but then, bc it’s orders of magnitude smarter than we are (by definition), it quickly develops a different perspective. But you’re saying it will remain ‘aligned’ in some vitally-important way, even when it discovers ways the code could’ve been written differently?
The AI would be expected to care about preserving its motivations under self-modification for similar reasons as it would care about defending them against outside intervention. There could be a window where the AI operates outside immediate human control but isn’t yet good at keeping its goals stable under self-modification. It’s been mentioned as a concern in the past; I don’t know what the state of current thinking is.