The scenario I’m thinking of is where we have a non-provably-Friendly AI or a uFAI but there are other existential risks to worry about. (I think this scenario may be the default, though—it seems somewhat likely to me that AGI is within reach of this generation of humans, whereas it is unclear if something-like-provably Friendly AI is possible, or what value there is in somewhat-more-stable-than-hacked-together AI.) It would be useful to understand what sorts of attractors there are for a self-modifying AI to fall into for either its decision theory or utility function, what the implications of our decision to run a uFAI would be in terms of either causal or acausal game theory, and generally what the heck we’d be knowingly inflicting on the multiverse if we decided to hit the big red button.
This.