Charlie Steiner comments on Corrigibility Via Thought-Process Deference

Charlie Steiner 29 Nov 2022 6:09 UTC
LW: 2 AF: 1
0
AF
One thing might be that I’d rather have an AI design that’s more naturally self-reflective, i.e. using its whole model to reason about itself, rather than having pieces that we’ve manually retargeted to think about some other pieces. This reduces how much Cartesian doubt is happening on the object level all at the same time, which sorta takes the AI farther away from the spec. But this maybe isn’t that great an example, because maybe it’s more about not endorsing the “retargeting the search” agenda.