Basically, I think your later section—”Maybe you think”—is pointing in the right direction, and requiring a much higher standard than human-level at moral judgment is reasonable and consistent with the explicit standard set by essays by Yudkowsky and other MIRI people. CEV was about this; talk about philosophical competence or metaphilosophy was about this. “Philosophy with a deadline” would be a weird way to put it if you thought contemporary philosophy was good enough.
I don’t think this is the crux. E.g., I’d wager the number of bits you need to get into an ASI’s goals in order to make it corrigible is quite a bit smaller than the number of bits required to make an ASI behave like a trustworthy human, which in turn is way way smaller than the number of bits required to make an ASI implement CEV.
The issue is that (a) the absolute number of bits for each of these things is still very large, (b) insofar as we’re training for deep competence and efficiency we’re training against corrigibility (which makes it hard to hit both targets at once), and (c) we can’t safely or efficiently provide good training data for a lot of the things we care about (e.g., ‘if you’re a superintelligence operating in a realistic-looking environment, don’t do any of the things that destroy the world’).
None of these points require that we (or the AI) solve novel moral philosophy problems. I’d be satisfied with an AI that corrigibly built scanning tech and efficient computing hardware for whole-brain emulation, then shut itself down; the AI plausibly doesn’t even need to think about any of the world outside of a particular room, much less solve tricky questions of population ethics or whatever.
I don’t think this is the crux. E.g., I’d wager the number of bits you need to get into an ASI’s goals in order to make it corrigible is quite a bit smaller than the number of bits required to make an ASI behave like a trustworthy human, which in turn is way way smaller than the number of bits required to make an ASI implement CEV.
The issue is that (a) the absolute number of bits for each of these things is still very large, (b) insofar as we’re training for deep competence and efficiency we’re training against corrigibility (which makes it hard to hit both targets at once), and (c) we can’t safely or efficiently provide good training data for a lot of the things we care about (e.g., ‘if you’re a superintelligence operating in a realistic-looking environment, don’t do any of the things that destroy the world’).
None of these points require that we (or the AI) solve novel moral philosophy problems. I’d be satisfied with an AI that corrigibly built scanning tech and efficient computing hardware for whole-brain emulation, then shut itself down; the AI plausibly doesn’t even need to think about any of the world outside of a particular room, much less solve tricky questions of population ethics or whatever.