Gunnar_Zarncke comments on Moral strategies at different capability levels

Gunnar_Zarncke 27 Jul 2022 21:20 UTC
6 points
0
One of the approaches in Steven Byrnes’ Brain-like AGI Safety is reverse engineering human motivation systems, e.g., the Social-instinct AGI in chapter 12. Your break-down suggests that ‘just’ reverse-engineering human alignment is not enough.
Arbitrarily-scalable deference-morality looks like an intent-aligned AGI. One lens on why intent alignment is difficult is that deference-morality is inherently unnatural for agents who are much more capable than the others around them.