I agree that people’s actual moral views don’t track all that well with correct reasoning from their fundamental norms. Normative reasoning is just one causal influence on our views but there’s plenty of biases such as from status games that also play a causal role. That’s no problem for my theory. It just carefully avoids the distortions and focuses on the paths with correct reasoning to determine the normative truths. In general, our conscious desires and first-order views don’t matter that much on my view unless they are endorsed by the standards we implicitly appeal to when reflecting.
If anything, these status games and other biases are much more of a problem for Paul’s indirect normativity since Paul pursues extrapolation by simulating the entire person, which includes their normative reasoning but also all their biases. Are the emulations getting wiser or are they stagnating in their moral blindspots, being driven subtly insane by the strange, unprecedented circumstances or simply gradually becoming different people whose values no longer reflect the originals’?
I’m sure there are various clever mechanisms that can mitigate some of this (while likely introducing other distortions), but from my perspective, these just seem like epicycles trying to correct for garbage input. If what we want is better normative reasoning, it’s much cleaner and more elegant to precisely understand that process and extrapolate from that, not the entire, contradictory kludgy mess of a human brain.
Given the astronomical stakes, I’m just not satisfied with trusting in any humans’ morality. Even the most virtuous people in history are inevitably seen in hindsight to have glaring moral flaws. Hoping you pick the right person who will prove to be an exception is not much of a solution. I think aligning superintelligence requires superhuman performance on ethics.
Now I’m sympathetic to the concern that a metaethical implementation may be brittle but I’d prefer to address this metaphilosophically. For instance, we should be able to metasemantically check whether our concept of ‘ought’ matches with the metaethical theory programmed into the AI. Adding in metaethics, we may be able to extend that to cases where we ought to revise our concept to match the metaethical theory. In an ideal world, we would even be able to program in a self-correcting metaethical / metaphilosophical theory such that so long as it starts off with an adequate theory, it will eventually revise itself into the correct theory. Of course, we’d still want to supplement this with additional checks such as making it show its work and evaluating that along with its solutions to unrelated philosophical problems.
I agree that people’s actual moral views don’t track all that well with correct reasoning from their fundamental norms. Normative reasoning is just one causal influence on our views but there’s plenty of biases such as from status games that also play a causal role. That’s no problem for my theory. It just carefully avoids the distortions and focuses on the paths with correct reasoning to determine the normative truths. In general, our conscious desires and first-order views don’t matter that much on my view unless they are endorsed by the standards we implicitly appeal to when reflecting.
If anything, these status games and other biases are much more of a problem for Paul’s indirect normativity since Paul pursues extrapolation by simulating the entire person, which includes their normative reasoning but also all their biases. Are the emulations getting wiser or are they stagnating in their moral blindspots, being driven subtly insane by the strange, unprecedented circumstances or simply gradually becoming different people whose values no longer reflect the originals’?
I’m sure there are various clever mechanisms that can mitigate some of this (while likely introducing other distortions), but from my perspective, these just seem like epicycles trying to correct for garbage input. If what we want is better normative reasoning, it’s much cleaner and more elegant to precisely understand that process and extrapolate from that, not the entire, contradictory kludgy mess of a human brain.
Given the astronomical stakes, I’m just not satisfied with trusting in any humans’ morality. Even the most virtuous people in history are inevitably seen in hindsight to have glaring moral flaws. Hoping you pick the right person who will prove to be an exception is not much of a solution. I think aligning superintelligence requires superhuman performance on ethics.
Now I’m sympathetic to the concern that a metaethical implementation may be brittle but I’d prefer to address this metaphilosophically. For instance, we should be able to metasemantically check whether our concept of ‘ought’ matches with the metaethical theory programmed into the AI. Adding in metaethics, we may be able to extend that to cases where we ought to revise our concept to match the metaethical theory. In an ideal world, we would even be able to program in a self-correcting metaethical / metaphilosophical theory such that so long as it starts off with an adequate theory, it will eventually revise itself into the correct theory. Of course, we’d still want to supplement this with additional checks such as making it show its work and evaluating that along with its solutions to unrelated philosophical problems.