Would this agent be able to reason about off switches? Imagine an AI getting out, reading this paper on the internet, and deciding that it should kill all humans before they realize what’s happening, just in case they installed an off switch it cannot know about. Or perhaps put them into lotus eater machines, in case they installed a dead man’s switch.
The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.
If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.
Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.
Would this agent be able to reason about off switches? Imagine an AI getting out, reading this paper on the internet, and deciding that it should kill all humans before they realize what’s happening, just in case they installed an off switch it cannot know about. Or perhaps put them into lotus eater machines, in case they installed a dead man’s switch.
This approach works under the assumption that the AI knows everything there is to know about its off switch.
And an AI that would kill everyone in case it had an off switch, is one that desperately needs a (public) off switch on it.
The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.
If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.
Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.
This class of off switch is designed for the AI not to work around.