davidad comments on davidad’s Shortform

davidad 12 Apr 2022 19:50 UTC
LW: 3 AF: 1
AF
“Concern, Respect, and Cooperation” is a contemporary moral-philosophy book by Garrett Cullity which advocates for a pluralistic foundation of morality, based on three distinct principles:
- Concern: Moral patients’ welfare calls for promotion, protection, sensitivity, etc.
- Respect: Moral patients’ self-expression calls for non-interference, listening, address, etc.
- Cooperation: Worthwhile collective action calls for initiation, joining in, collective deliberation, sharing responsibility, etc. And one bonus principle, whose necessity he’s unsure of:
- Protection: Precious objects call for protection, appreciation, and communication of the appreciation.
What I recently noticed here and want to write down is a loose correspondence between these different foundations for morality and some approaches to safe superintelligence:
- CEV-maximization corresponds to finding a good enough definition of human welfare that Comcern alone suffices for safety.
- Corrigibility corresponds to operationalizating some notion of Respect that would alone suffice for safety.
- Multi-agent approaches lean in the direction of Cooperation.
- Approaches that aim to just solve literally the “superintelligence that doesn’t destroy us” problem, without regard for the cosmic endowment, sometimes look like Protection.
Cullity argues that none of his principles is individually a satisfying foundation for morality, but that all four together (elaborated in certain ways with many caveats) seem adequate (and maybe just the first three). I have a similar intuition about AI safety approaches. I can’t yet make the analogy precise, but I feel worried when I imagine corrigibility alone, CEV alone, bargaining alone (whether causal or acausal), or Earth-as-wildlife-preserve; whereas I feel pretty good imagining a superintelligence that somehow balances all four. I can imagine that one of them might suffice as a foundation for the others, but I think this would be path-dependent at best. I would be excited about work that tries to do for Cullity’s entire framework what CEV does for pure single-agent utilitarianism (namely, make it more coherent and robust and closer to something that could be formally specified).