This is extraordinarily impressive, even with the limits on your proofs. I’d argue this is a possible situation where assuming certain potentially minor things work out, this could be a great example of an MIRI-inspired success, and one of the likeliest to transfer to future models of AI. Given that we usually want AIs to be able to be shutdownable without compromising on the most useful parts of expected utility maximization, this is pretty extraordinary as a result.
This is extraordinarily impressive, even with the limits on your proofs. I’d argue this is a possible situation where assuming certain potentially minor things work out, this could be a great example of an MIRI-inspired success, and one of the likeliest to transfer to future models of AI. Given that we usually want AIs to be able to be shutdownable without compromising on the most useful parts of expected utility maximization, this is pretty extraordinary as a result.