Nobody said something about no off switches. Off-switches mean that you need to understand that the program is doing something wrong to switch it off. A complex AGI that acts in complex ways might produce damage that you can’t trace.
Furthermore self modification might destroy an off switch.
I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.
By an off switch I mean a backup goal.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
It’s foolish to build things without off switches, which translates to building flexible iinteligences that only pursue one goal.
Nobody said something about no off switches. Off-switches mean that you need to understand that the program is doing something wrong to switch it off. A complex AGI that acts in complex ways might produce damage that you can’t trace. Furthermore self modification might destroy an off switch.
By an off switch I mean a backup goal.
I know nobody mentioned it. The point is that Clippie has one main goal, any no backup goal, so off switches, in my sense, are being IMPLICITLY omitted.
Goals are standardly regarded as immune self modification, so an off switch, in my sense, would be too.
No. Part of what making an FAI is about is to produce agents that keeps their values constant under self modification. It’s not something where you expect that someone accidently get’s it right.
Tht isn’t a fact. MIRI assumes goal stability is desirable for safety, but at the same time, MIRIs favourite UFAI is only possible with goal stability.
A paperclip maximizer wouldn’t become that much less scary if it accidentally turned itself into a paperclip-or-staple maximizer, though.
What if it decided making paperclips was boring, and spent some time in deep meditation formulating new goals for itself?
Paperclip maximizers serve as illustration of a principle. I think that most MIRI folks consider UFAI to be more complicated than simple paperclip maximizers.
Goal stability also get’s harder the more complicated the goal happens to be. A paperclip maximizer can have a off switch but at the same time prevent anyone from pushing that switch.
This is quite a subtle issue.
If the “backup goal” is always in effect, eg. it is just another clause of the main goal. For example, “maximise paperclips” with a backup goal of “do what you are told” is the same as having the main goal “maximise paperclips while doing what you are told”.
If the “backup goal” is a separate mode which we can switch an AI into, eg. “stop all external interaction”, then it will necessarily conflict with the the AI’s main goal: it can’t maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: “in order to maximise paperclips, I should prevent anyone switching me to my backup goal”. These kind of secondary goals have been raised by Steve Omohundro.
You haven’t dealt with the case where the safety goals are the primary ones.
These kinds of primary goals have been raised by Isaac Asimov.
The question of “what are the right safety goals” is what FAI research is all about.