Moral realism[2], and for some reason the agent cares.
Not moral realism, but the practical equivalent—orthogonality may be technically true, but the default thing that comes out of making something sufficiently smart is something human-friendly, and you’d have to have a very good mechanistic understanding of intelligence to create something unfriendly.
How would the first be different from the second? What are you understanding by “moral realism makes the AI human-friendly” that is not just “practical convergence”?
Are you picturing something like “the AI reasons enough / reads this text (which would also convince humans if presented correctly) and becomes completely convinced that a certain moral theory is true, and that it must follow it (a la Descartes with God)”? Because that’s just a particular way of getting “practical convergence”. You probably agree with that (as you mentioned, they are not mutually exclusive), but I’m interested in whether you understood anything else by moral realism here.
How would the first be different from the second? What are you understanding by “moral realism makes the AI human-friendly” that is not just “practical convergence”?
Are you picturing something like “the AI reasons enough / reads this text (which would also convince humans if presented correctly) and becomes completely convinced that a certain moral theory is true, and that it must follow it (a la Descartes with God)”? Because that’s just a particular way of getting “practical convergence”. You probably agree with that (as you mentioned, they are not mutually exclusive), but I’m interested in whether you understood anything else by moral realism here.