Capybasilisk comments on Towards shutdownable agents via stochastic choice

Capybasilisk 19 Nov 2024 23:36 UTC
1 point
0
Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?

I don’t think it is.

So funnily enough, every agent that lets you do this is misaligned by default.