Yep. People often talk about “Coherent extrapolated volition” (CEV) alignment, and Corrigibility (in the MIRI/Yudkowsky sense rather than the Christiano sense).
I think these two things roughly correspond to the two things you wrote
Yep. People often talk about “Coherent extrapolated volition” (CEV) alignment, and Corrigibility (in the MIRI/Yudkowsky sense rather than the Christiano sense).
I think these two things roughly correspond to the two things you wrote