The point of corrigibility is to remove the instrumental incentive to avoid shutdown, not to avoid all negative outcomes. Our civilization can work on addressing side effects of shutdownability later after we’ve made agents shutdownable.
I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
You should turn on an AGI with the assumption you don’t get to decide when to turn it off.
The point of corrigibility is to remove the instrumental incentive to avoid shutdown, not to avoid all negative outcomes. Our civilization can work on addressing side effects of shutdownability later after we’ve made agents shutdownable.
I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
You should turn on an AGI with the assumption you don’t get to decide when to turn it off.