I tend to think that it’s not so much decision theory that matters, but rather how that particular agent makes decisions. Although if one decision theory does turn out to just be more rational, we might see different starting mechanisms converge on that as AGI self-improves to ASI.
Agreed that this is critically important. See my post The alignment stability problem and my 2018 chapter Goal changes in intelligent agents.
I also think that prosaic alignment generally isn’t considering the critical importance of reflective stability. I just wrote some about that in the post I just put up on how Conflating value alignment and intent alignment is causing confusion.
I tend to think that it’s not so much decision theory that matters, but rather how that particular agent makes decisions. Although if one decision theory does turn out to just be more rational, we might see different starting mechanisms converge on that as AGI self-improves to ASI.