Expected utility maximization is only applicable when utility is known. When it’s not, various anti-goodharting considerations become more important, maintaining ability to further develop understanding of utility/values without leaning too much on any current guesses of what that is going to be. Keeping humans in control of our future is useful for that, but instrumentally convergent actions such as grabbing the matter in the future lightcone (without destroying potentially morally relevant information such as aliens) and moving decision making to a better substrate are also helpful for whatever our values eventually settle as. The process should be corrigible, should allow replacing humans-in-control with something better as understanding of what that is improves (and not getting locked-in into that either). The AI risk is about failing to set up this process.
Expected utility maximization is only applicable when utility is known. When it’s not, various anti-goodharting considerations become more important, maintaining ability to further develop understanding of utility/values without leaning too much on any current guesses of what that is going to be. Keeping humans in control of our future is useful for that, but instrumentally convergent actions such as grabbing the matter in the future lightcone (without destroying potentially morally relevant information such as aliens) and moving decision making to a better substrate are also helpful for whatever our values eventually settle as. The process should be corrigible, should allow replacing humans-in-control with something better as understanding of what that is improves (and not getting locked-in into that either). The AI risk is about failing to set up this process.