Yeah, this seems right. I can only repeat my earlier praise for FRI’s work on s-risks.
For a more technical angle, has anyone thought about making strong AIs stoppable by giving them wrong priors? For example, an AI for doing physics research could start with a prior saying the experiment chamber is the whole universe, any “noise” coming from outside is purely random and uncorrelated across time, and a particular shape of “noise” should make the AI clean up and shut down. That way no amount of observation or self-improvement can let it infer our existence, so we’ll be able to shut it down. That should be easy to formalize in a cellular automaton world, though real physics is of course much harder.
Yeah, this seems right. I can only repeat my earlier praise for FRI’s work on s-risks.
For a more technical angle, has anyone thought about making strong AIs stoppable by giving them wrong priors? For example, an AI for doing physics research could start with a prior saying the experiment chamber is the whole universe, any “noise” coming from outside is purely random and uncorrelated across time, and a particular shape of “noise” should make the AI clean up and shut down. That way no amount of observation or self-improvement can let it infer our existence, so we’ll be able to shut it down. That should be easy to formalize in a cellular automaton world, though real physics is of course much harder.