Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
I don’t think it is.
So funnily enough, every agent that lets you do this is misaligned by default.
Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?
I don’t think it is.
So funnily enough, every agent that lets you do this is misaligned by default.