This seems as misled as arguing that any AGI will obviously be aligned because to turn the universe into paperclips is stupid. We can conceivably build an AGI that is aware that humans are self-contradictory and illogical, and therefore won’t assume that they are rational because it knows that that would make it misaligned. We can do at least as well as an overseer that intervenes on needless death and suffering as it would happen.
This seems as misled as arguing that any AGI will obviously be aligned because to turn the universe into paperclips is stupid. We can conceivably build an AGI that is aware that humans are self-contradictory and illogical, and therefore won’t assume that they are rational because it knows that that would make it misaligned. We can do at least as well as an overseer that intervenes on needless death and suffering as it would happen.