David Althaus comments on Making AIs less likely to be spiteful

David Althaus 29 Sep 2023 12:19 UTC
11 points
3
Really great post!

It’s unclear how much human psychology can inform our understanding of AI motivations and relevant interventions but it does seem relevant that spitefulness correlates highly (Moshagen et al., 2018, Table 8, N 1,261) with several other “dark traits”, especially psychopathy (r = .74), sadism (r = .59), and Machiavellianism (r = .59).

(Moshagen et al. (2018) therefore suggest that “[...] dark traits are specific manifestations of a general, basic dispositional behavioral tendency [...] to maximize one’s individual utility— disregarding, accepting, or malevolently provoking disutility for others—, accompanied by beliefs that serve as justifications.”)

Plausibly there are (for instance, evolutionary) reasons for why these traits correlate so strongly with each other, and perhaps better understanding them could inform interventions to reduce spite and other dark traits (cf. Lukas’ comment).

If this is correct, we might suspect that AIs that will exhibit spiteful preferences/behavior will also tend to exhibit other dark traits (and vice versa!), which may be action guiding. (For example, interventions that make AIs less likely to be psychopathic, sadistic, Machiavellian, etc. would also make them less spiteful, at least in expectation.)