I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
You should turn on an AGI with the assumption you don’t get to decide when to turn it off.
I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
That’s only a flaw if the AGI is aligned. If we’re sufficiently concerned the AGI might be misaligned, we want it to allow shutdown.
Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
I don’t think it is.
So funnily enough, every agent that lets you do this is misaligned by default.
I’m pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.
You should turn on an AGI with the assumption you don’t get to decide when to turn it off.
That’s only a flaw if the AGI is aligned. If we’re sufficiently concerned the AGI might be misaligned, we want it to allow shutdown.
Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
I don’t think it is.
So funnily enough, every agent that lets you do this is misaligned by default.