By an off switch I mean a backup goal.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
You haven’t dealt with the case where the safety goals are the primary ones.
These kinds of primary goals have been raised by Isaac Asimov.
The question of “what are the right safety goals” is what FAI research is all about.