One of the approaches in Steven Byrnes’ Brain-like AGI Safety is reverse engineering human motivation systems, e.g., the Social-instinct AGI in chapter 12. Your break-down suggests that ‘just’ reverse-engineering human alignment is not enough.
Arbitrarily-scalable deference-morality looks like an intent-aligned AGI. One lens on why intent alignment is difficult is that deference-morality is inherently unnatural for agents who are much more capable than the others around them.
One of the approaches in Steven Byrnes’ Brain-like AGI Safety is reverse engineering human motivation systems, e.g., the Social-instinct AGI in chapter 12. Your break-down suggests that ‘just’ reverse-engineering human alignment is not enough.