There are indeed several senses in which outside-view-style reasoning is helpful: if you’re a biased yet reflective reasoner, and also if the agent contains a true pointer to what humans want (if it’s intent aligned). The latter is a subset of the former.
But, it also seems like there should be some sense in which you can employ outside-view reasoning all the way down, meaningfully increasing corrigibility without assuming intent alignment. Maybe that’s a confused thing to say. I still feel confused, at least.
There are indeed several senses in which outside-view-style reasoning is helpful: if you’re a biased yet reflective reasoner, and also if the agent contains a true pointer to what humans want (if it’s intent aligned). The latter is a subset of the former.
But, it also seems like there should be some sense in which you can employ outside-view reasoning all the way down, meaningfully increasing corrigibility without assuming intent alignment. Maybe that’s a confused thing to say. I still feel confused, at least.