My interpretation of what you’re saying here is that the overseer in step #1 can do a lot of things to bake in having the AI interpret “help the user get what they really want” in ways that get the AI to try to eliminate human safety problems for the step #2 user (possibly entirely), but problems might still occur in the short term before the AI is able to think/act to remove those safety problems.
It seems to me that this implies that IDA essentially solves the AI alignment portion of points 1 and 2 in the original post (modulo things happening before the AI is in control).
My interpretation of what you’re saying here is that the overseer in step #1 can do a lot of things to bake in having the AI interpret “help the user get what they really want” in ways that get the AI to try to eliminate human safety problems for the step #2 user (possibly entirely), but problems might still occur in the short term before the AI is able to think/act to remove those safety problems.
It seems to me that this implies that IDA essentially solves the AI alignment portion of points 1 and 2 in the original post (modulo things happening before the AI is in control).