It’s just semantic confusion. The AI will execute its source code under all circumstances. Let me try and explain what I mean a little more carefully.
Imagine that an AI is designed to read corporate emails and write a summary document describing what various factions of people within and outside the corporation are trying to get the corporation as a whole to do. For example, it says what the CEO is trying to get it to do, what its union is trying to get it to do, and what regulators are trying to get it to do. We can call this task “goal inference.”
Now imagine that an AI is designed to do goal inference on other programs. It inspects their source code, integrates this code with its knowledge about the world, and produces a summary not only about what the programmers are trying to accomplish with the program, but what the stakeholders who’ve commissioned the program are trying to use it for. An advanced version can even predict what sorts of features and improvements its future users will request.
Even more advanced versions of these AIs can not only produce these summaries, but implement changes to the software based on these summary reports. They are also capable of providing a summary of what was changed, how, and why.
Naturally, this AI is able to operate on itself as well. It can examine its own source code, produce a summary report about what it believes various factions of humans were trying to accomplish by writing it, anticipate improvements and bug fixes they’ll desire in the future, and then make those improvements once it receives approval from the designers.
An AI that does not do this is doing what I call “straightforwardly” executing its source code. This self-modifying AI is also executing its source code, but that same source code is instructing it to modify the code. This is what I mean as the opposite of “straightforwardly.”
So there is no ghost in the machine here. All the same, the behavior of an AI like this seems hard to predict.
This makes sense, and I agree that there’s no ghost in the machine in this story.
It seems though that this story is relying quite heavily on the assumption that the “AI is designed to do goal inference on other programs”, whereas your post seems to be making claims about all possible AIs. (The orthogonality thesis only claims that there exists an AI system with intelligence level X and goal Y for all X and Y, so its negation is that there is some X and Y such that every AI system either does not have intelligence level X or does not have goal Y.)
Why can’t there be a superintelligent AI system that doesn’t modify its goal?
(I agree it will be able to tell the difference between a thing and its representation. You seem to be assuming that the “goal” is the thing humans want and the “representation” is the thing in its source code. But it also seems possible that the “goal” is the thing in its source code and the “representation” is the thing humans want.)
(I also agree that it will know that humans meant for the “goal” to be things humans want. That doesn’t mean that the “goal” is actually things humans want, from the AI’s perspective.)
It’s just semantic confusion. The AI will execute its source code under all circumstances. Let me try and explain what I mean a little more carefully.
Imagine that an AI is designed to read corporate emails and write a summary document describing what various factions of people within and outside the corporation are trying to get the corporation as a whole to do. For example, it says what the CEO is trying to get it to do, what its union is trying to get it to do, and what regulators are trying to get it to do. We can call this task “goal inference.”
Now imagine that an AI is designed to do goal inference on other programs. It inspects their source code, integrates this code with its knowledge about the world, and produces a summary not only about what the programmers are trying to accomplish with the program, but what the stakeholders who’ve commissioned the program are trying to use it for. An advanced version can even predict what sorts of features and improvements its future users will request.
Even more advanced versions of these AIs can not only produce these summaries, but implement changes to the software based on these summary reports. They are also capable of providing a summary of what was changed, how, and why.
Naturally, this AI is able to operate on itself as well. It can examine its own source code, produce a summary report about what it believes various factions of humans were trying to accomplish by writing it, anticipate improvements and bug fixes they’ll desire in the future, and then make those improvements once it receives approval from the designers.
An AI that does not do this is doing what I call “straightforwardly” executing its source code. This self-modifying AI is also executing its source code, but that same source code is instructing it to modify the code. This is what I mean as the opposite of “straightforwardly.”
So there is no ghost in the machine here. All the same, the behavior of an AI like this seems hard to predict.
This makes sense, and I agree that there’s no ghost in the machine in this story.
It seems though that this story is relying quite heavily on the assumption that the “AI is designed to do goal inference on other programs”, whereas your post seems to be making claims about all possible AIs. (The orthogonality thesis only claims that there exists an AI system with intelligence level X and goal Y for all X and Y, so its negation is that there is some X and Y such that every AI system either does not have intelligence level X or does not have goal Y.)
Why can’t there be a superintelligent AI system that doesn’t modify its goal?
(I agree it will be able to tell the difference between a thing and its representation. You seem to be assuming that the “goal” is the thing humans want and the “representation” is the thing in its source code. But it also seems possible that the “goal” is the thing in its source code and the “representation” is the thing humans want.)
(I also agree that it will know that humans meant for the “goal” to be things humans want. That doesn’t mean that the “goal” is actually things humans want, from the AI’s perspective.)