This might be the most important alignment idea in a while.
Making an honest argument based on ideological agreements is a solidly good idea.
“Alignment” meaning alignment to one group is not ideal. But I’m afraid it’s inevitable. Technical alignment will always be easier with a simpler alignment target. For instance, making an AGI aligned to the good of all humanity is much trickier than aligning it to want to do what one particular human says to do. Taking directions is almost completely a subset of inferring desires, and one person (or a small group) is a subset of all of humanity — and much easier to define.
If that human (or their designated successor(s)) has any compassion and any sense, they’ll make their own and their AGIs goal to create fully value-aligned AGI. Instruction following or Intent alignment can be a stepping-stone to value alignment.
It is time to reach across the aisle. The reasons you mention are powerful. Another is to avoid polarization on this issue. Polarization appears to have completely derailed the discussion of climate change, similar to alignment in being new and science-based. Curernt guesses are that the US democratic party would be prone to pick up the AI safety banner — which could polarize alignment. Putting existential risk, at least, on the conservative side might be a better idea for the next four years, and for longer if it reduces polarization by aligning US liberal concerns about harms to individuals (e.g., artists) and bias in AI systems, with conservative concerns about preserving our values and our way of life(e.g., concerns we’ll all die or be obsoleted)
This might be the most important alignment idea in a while.
Making an honest argument based on ideological agreements is a solidly good idea.
“Alignment” meaning alignment to one group is not ideal. But I’m afraid it’s inevitable. Technical alignment will always be easier with a simpler alignment target. For instance, making an AGI aligned to the good of all humanity is much trickier than aligning it to want to do what one particular human says to do. Taking directions is almost completely a subset of inferring desires, and one person (or a small group) is a subset of all of humanity — and much easier to define.
If that human (or their designated successor(s)) has any compassion and any sense, they’ll make their own and their AGIs goal to create fully value-aligned AGI. Instruction following or Intent alignment can be a stepping-stone to value alignment.
It is time to reach across the aisle. The reasons you mention are powerful. Another is to avoid polarization on this issue. Polarization appears to have completely derailed the discussion of climate change, similar to alignment in being new and science-based. Curernt guesses are that the US democratic party would be prone to pick up the AI safety banner — which could polarize alignment. Putting existential risk, at least, on the conservative side might be a better idea for the next four years, and for longer if it reduces polarization by aligning US liberal concerns about harms to individuals (e.g., artists) and bias in AI systems, with conservative concerns about preserving our values and our way of life(e.g., concerns we’ll all die or be obsoleted)