Arguments from moral realism, fully robust alignment, that ‘good enough’ alignment is good enough in practice, and related concepts.
What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)
Arguments from good outcomes being so cheap the AIs will allow them.
If you’re putting this below the Point of No Return, then I don’t think you’ve understood the argument. The claim isn’t that good outcomes are so cheap that even a paperclip maximizer would implement them. (Obviously, a paperclip maximizer kills you and uses the atoms to make paperclips.)
The claim is that it’s plausible for AIs to have some human-regarding preferences even if we haven’t really succeeded at alignment, and that good outcomes for existing humans are so cheap that AIs don’t have to care about the humans very much in order to spend a tiny fraction of their resources on them. (Compare to how some humans care enough about animal welfare to spend an tiny fraction of our resources helping nonhuman animals that already exist, in a way that doesn’t seem like it would be satisfied by killing existing animals and replacing them with artificial pets.)
There are lots of reasons one might disagree with this: maybe you don’t think human-regarding preferences are plausible at all, maybe you think accidental human-regarding preferences are bad rather than good (the humans in “Three Worlds Collide” didn’t take the Normal Ending lying down), maybe you think it’s insane to have such a scope-insensitive concept of good outcomes—but putting it below arguments from science fiction or blind faith (!) is silly.
What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)
Technically even Moral Realism doesn’t imply Anti-Orthogonality thesis! Moral Realism is necessary but not sufficient for Anti-Orthogonality, you have to be a particular kind of very hardcore platonist moral realist who believes that ‘to know the good is to do the good’, to be Anti-Orthogonality, and argue that not only are there moral facts but that these facts are intrinsically motivating.
Most moral realists would say that it’s possible to know what’s good but not act on it: even if this is an ‘unreasonable’ disposition in some sense, this ‘unreasonableness’ it’s compatible with being extremely intelligent and powerful in practical terms.
Even famous moral realists like Kant wouldn’t deny the Orthogonality thesis: Kant would accept that it’s possible to understand hypothetical but not categorical imperatives, and he’d distinguish capital-R Reason from simple means-end ‘rationality’. I think from among moral realists, it’s really only platonists and divine command theorists who’d deny Orthogonality itself.
What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)
If you’re putting this below the Point of No Return, then I don’t think you’ve understood the argument. The claim isn’t that good outcomes are so cheap that even a paperclip maximizer would implement them. (Obviously, a paperclip maximizer kills you and uses the atoms to make paperclips.)
The claim is that it’s plausible for AIs to have some human-regarding preferences even if we haven’t really succeeded at alignment, and that good outcomes for existing humans are so cheap that AIs don’t have to care about the humans very much in order to spend a tiny fraction of their resources on them. (Compare to how some humans care enough about animal welfare to spend an tiny fraction of our resources helping nonhuman animals that already exist, in a way that doesn’t seem like it would be satisfied by killing existing animals and replacing them with artificial pets.)
There are lots of reasons one might disagree with this: maybe you don’t think human-regarding preferences are plausible at all, maybe you think accidental human-regarding preferences are bad rather than good (the humans in “Three Worlds Collide” didn’t take the Normal Ending lying down), maybe you think it’s insane to have such a scope-insensitive concept of good outcomes—but putting it below arguments from science fiction or blind faith (!) is silly.
Technically even Moral Realism doesn’t imply Anti-Orthogonality thesis! Moral Realism is necessary but not sufficient for Anti-Orthogonality, you have to be a particular kind of very hardcore platonist moral realist who believes that ‘to know the good is to do the good’, to be Anti-Orthogonality, and argue that not only are there moral facts but that these facts are intrinsically motivating.
Most moral realists would say that it’s possible to know what’s good but not act on it: even if this is an ‘unreasonable’ disposition in some sense, this ‘unreasonableness’ it’s compatible with being extremely intelligent and powerful in practical terms.
Even famous moral realists like Kant wouldn’t deny the Orthogonality thesis: Kant would accept that it’s possible to understand hypothetical but not categorical imperatives, and he’d distinguish capital-R Reason from simple means-end ‘rationality’. I think from among moral realists, it’s really only platonists and divine command theorists who’d deny Orthogonality itself.