Nobody seems to have any real idea how to achieve either one
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
humans not being in charge is in itself an unacceptable outcome, or at least weighs very heavily against the desirability of an outcome
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.
I think that’s not true and we in fact have a much better idea of how to achieve corrigibility / intent alignment. (Not going to defend that here. You could see my comment here, though that one only argues why it might be easier rather than providing a method.)
Others will disagree with me on this.
The usual argument I’d give is “if humans aren’t in charge, then we can’t course correct if something goes wrong”. It’s instrumental, not terminal. If we ended up in a world like this where humans were not in charge, that seems like it could be okay depending on the details.