My prediction is that there really is an evolved nudge towards empathy in the human motivational system, and that human psychology—like usually being empathetic but sometimes modulating it and often justifying self-serving actions—is sculpted by such evolved nudges, and wouldn’t be recapitulates in AI lacking those nudges.
I agree—this is partly what I am trying to say in the contextual modulation section. The important thing is that the base capability for empathy might exist as a substrate to then get sculpted by gradient descent / evolution to implement a wide range of adaptive pro or anti-social emotions/behaviours. Which of these behaviours, if any, get used by the AI will depend on the reward function / training data it sees.
I agree—this is partly what I am trying to say in the contextual modulation section. The important thing is that the base capability for empathy might exist as a substrate to then get sculpted by gradient descent / evolution to implement a wide range of adaptive pro or anti-social emotions/behaviours. Which of these behaviours, if any, get used by the AI will depend on the reward function / training data it sees.