I said on Twitter a while back that much of the discussion about “alignment” seems vacuous. After all, alignment to what?
The designer’s intent? Often that is precisely the problem with software. It does exactly and literally what the designer programmed, however shortsighted. Plus, some designers may themselves be malevolent.
Aligned with human values? One of the most universal human values is tribalism, including willingness to oppress or kill the outgroup.
Aligned with “doing good things”? Whose definition of “good”?
That sounds like a list of starting questions for goalcraft. Incidentally, while I don’t see Coherent Extrapolated Volition as a complete or final solution, it at least solves all of the objections your raise, so if you haven’t read the discussion of that, I recommend it, and that would catch you up to where this discussion was 15 years ago.
I said on Twitter a while back that much of the discussion about “alignment” seems vacuous. After all, alignment to what?
The designer’s intent? Often that is precisely the problem with software. It does exactly and literally what the designer programmed, however shortsighted. Plus, some designers may themselves be malevolent.
Aligned with human values? One of the most universal human values is tribalism, including willingness to oppress or kill the outgroup.
Aligned with “doing good things”? Whose definition of “good”?
That sounds like a list of starting questions for goalcraft. Incidentally, while I don’t see Coherent Extrapolated Volition as a complete or final solution, it at least solves all of the objections your raise, so if you haven’t read the discussion of that, I recommend it, and that would catch you up to where this discussion was 15 years ago.