What is malevolence? On the nature, measurement, and distribution of dark traits

23 Oct 2024 8:41 UTC

76 points

World Optimization Ethics & Morality Futurism World Modeling

Summary

In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. For the purposes of this piece, we define malevolence as a tendency to disvalue (or to fail to value) others’ well-being (more). Such a tendency is concerning, especially when exhibited by powerful actors, because of its correlation with malevolent behaviors (i.e., behaviors that harm or fail to protect others’ well-being). But reducing the long-term societal risks posed by individuals with high levels of malevolence is not straightforward.

Individuals with high levels of malevolent traits can be difficult to recognize. Some people do not take into account the fact that malevolence exists on a continuum, or do not realize that dark traits are compatible with moral convictions (more). Moral judgments and stigma can also make it difficult to think objectively about these topics, and can make it hard to acknowledge these traits when they are present.

Malevolence is often studied in the context of the so-called dark tetrad traits—sadism, psychopathy, Machiavellianism, and narcissism. Other dark traits relevant to the long-term future include vengefulness (related to retributivism) and spitefulness. Most dark traits positively correlate with each other, which may suggest the existence of a general factor of human malevolence (‘D’) (more).

Like all personality traits, dark traits occur on a continuum; categories such as “psychopath” and “narcissist” rely on relatively arbitrary cut-off points. It is also difficult to reliably measure someone’s levels of dark traits. Having said this, based on the information we have, individuals with concerning levels of dark traits could be common enough to influence the trajectory of the long-term future.

Available data suggest that the diagnostic categories of psychopathy (more) and Narcissistic Personality Disorder (more) have prevalence rates of at least ~2% in the general population. And among those who have taken surveys at the DarkFactor.org, about 3% of them have given sets of answers which we would consider concerning if expressed by someone with significant influence over transformative AI (TAI) (more). Even higher proportions have given concerning answers to individual questions: for example, among more than 37,000 people whose survey responses were analyzed as part of a recent study, over 16% of them agree or strongly agree that they “would like to make some people suffer,” even if that would mean they had to suffer with them.

What’s more, (non-incarcerated) malevolent individuals plausibly have a higher motivation and ability to obtain power, and may be more likely to stay in power, than non-malevolent individuals (more). It’s plausible that malevolent individuals could attain enough power to affect the trajectory of transformative AI.

There are many research questions one could attempt to answer if one wants to reduce the long-term negative impacts of malevolence. What interventions are most likely to effectively reduce the influence of malevolent actors? Are there ways in which the long-term future could be positively impacted by a better understanding of human malevolence, for example, by informing efforts to prevent or reduce malevolent-like dispositions in AIs (and if so, how)? We suggest potential research topics pertaining to these and other questions at the end of the post. (More)

Epistemic status: This post is intended as a starting point for thinking about human malevolence in the context of long-term risks, rather than a comprehensive overview. There are currently ~no published studies of which we’re aware that attempt to directly study malevolent human traits from a longtermist perspective,^[1] so we see the existing academic literature as a partly-useful starting point rather than something that needs to be exhaustively searched and summarized. Citing a study here does not necessarily imply that we’ve carefully vetted or endorsed it.

Malevolent actors will make the long-term future worse if they significantly influence TAI development

Malevolent behaviors (especially in the case of certain types of malevolence^[2]) are more concerning the more they are accompanied by a high motivation and capacity to attain and retain positions of control over transformative AI (TAI) or artificial superintelligence (ASI).^[3]

Overall, we expect that factors associated with malevolent behaviors are common enough (in relevant populations) for us to be concerned about how they could influence the long-term future, especially via the effects of TAI. There are two main reasons for this.

(1) First, even if actors in control of TAI were “only” roughly as malevolent as the general population, this would be enough to create a nontrivial risk that, in the future, someone with control over TAI would behave malevolently. Please see the sections on the distribution of dark traits in the population for details.

(2) We expect that the situation is worse than that, because we expect that people in positions of power (including those in control of TAI) are substantially more likely than the general population to behave malevolently, partly because of positive correlations between power attainment/retention and malevolence.

Important caveats when thinking about malevolence

Dark traits exist on a continuum

Black-and-white conceptions of malevolence are dangerous because they can lead us to make false inferences. For example, if someone (implicitly or explicitly) believes in a binary conception of malevolence – that a person is either “fully malevolent” or they are not – they might be more likely to mistakenly believe that someone who engages in genuinely altruistic behaviors (based on truly prosocial preferences) is unable to have high levels of malevolent traits.

When we refer to a malevolent actor, we mean “someone who has a concerningly higher probability of engaging in malevolent behaviors (compared to other actors).” (Please see the appendices for more details on how we define this.) Although the term “malevolent actor” is a convenient shorthand way to refer to such people, we want to avoid promoting inaccurate, simplistic, dichotomous conceptions of malevolence according to which people are either malevolent or they are not. We want to make it clear that malevolent traits exist on a continuum.

As far as we can tell, the vast majority of humans exhibit nonzero levels of malevolence, in the broad sense of the term. In other words, almost nobody is a perfectly impartial or unconditionally loving altruistic saint. Everyday experience suggests, for example, that most people care a lot more about their self-interest than is remotely justified by impartial benevolence, and don’t exactly feel overwhelming compassion for members of ideological outgroups.

Our (evolutionary) history also suggests that “malevolent” tendencies or behaviors aren’t exceedingly rare. For instance, violence was much more commonplace during much of our (evolutionary) history compared to today, and most of our ancestors had to repeatedly murder and eat other organisms (without being paralyzed by guilt) in order to survive (Halstead & Thomson, 2023).

Dark traits are often hard to identify

Some argue that malevolent traits are sufficiently off-putting to others that people with high levels of them would not obtain positions of influence. This argument often rests on the assumption that malevolent traits are easily identifiable. However, so far, evidence suggests that this is often not the case.^[4]

Others may argue that someone with high levels of malevolence is unlikely to be able to gain power because even if others don’t explicitly identify them as malevolent, they will be less likely to trust or like them. Negative impressions of people may indeed be more likely when the person being rated has high levels of dark traits (Rauthmann, 2012). However, simply liking someone slightly less seems insufficient protection against that person gaining influence. How much this matters depends on the environment. And it’s also worth noting that specific malevolent traits like narcissism can come across as particularly charming and likable, which can prevent people from forming negative impressions upon meeting them.^[5]

In addition to the (scant) research literature mentioned above (on the difficulties of identifying malevolent traits), based on personal anecdotes and the obvious historic examples—within and outside of EA—we’re rather pessimistic about people always taking appropriate countermeasures, especially if the malevolent person is competent and intelligent.

People with high levels of dark traits may not recognize them or may try to conceal them

There are multiple reasons why dark traits can be hard to identify in others. One reason is that those with high levels of these traits tend to conceal them from others (Jin et al., 2024). In addition, in some other cases, people with high levels of dark traits may lack insight into those traits.^[6] The American psychiatrist Hervey Cleckley, who wrote the first clinical description of psychopathy, described the condition as being characterized by a lack of insight: he claimed that a psychopath has “absolutely no capacity to see himself as others see him” (Miller, Jones & Lynam, 2011). But recent evidence contradicts this claim. More specifically, there is evidence that people do have (at least partial) insight into their dark traits (e.g., Maples-Keller & Miller, 2018; Miller, Jones & Lynam, 2011; Carlson, Vazire, & Oltmanns, 2011; and Sleep et al., 2019).

In many cases, dark traits are not even liked by those who possess them—they are egodystonic. For example, this paper found that, although dark traits were disliked less by those with high levels of them (compared to those with low levels), the overall ratings for the likeability of dark traits were still below the midpoint of the scales, even for those with relatively high levels of these traits (Sharpe et al., 2023). On the other hand, in some cases, dark traits really may be aligned with someone’s reflective meta-level preferences (regarding their own traits and values) - i.e., they may be egosyntonic.^[7] Some papers suggest that people don’t want to change (or may even endorse) their dark traits; indeed, the whole concept of malignant narcissism^[8] includes (as a defining feature) ego-syntonic sadism.

In light of the conflicting research cited above, it would be overly simplistic to assume that those with high levels of malevolence are consistently aware of and endorse their traits, with an internal monologue^[9] that goes something like this: “I’m so evil and just want to maximize my own power and gratify my own desires, no matter how much suffering this causes for everyone else, hahaha.”^[10] Although some people may think like that, it would be wrong to assume that everyone with high levels of malevolence thinks in this way.

People with high levels of narcissism, for example, often engage in (unconscious) self-deception^[11] but really do believe their own story. Their internal monologue might go something like: “I’m unbelievably great and I’m the only one who will save the world. Everyone who disagrees with me is stupid and/or immoral and I have every right to crush them.”^[12] A particularly illustrative example is the cult leader Amy Carlson. Carlson usually believed that she was the incarnation of the deity “Mother God.” However, on at least two occasions—and once even in front of her followers while crying uncontrollably—Carlson seems to have had brief periods of self-awareness and acknowledged that none of this is true and that she is just an ordinary human (see the HBO documentary for more details). As an (important) aside, the fact that Carlson told her followers about her doubts about being God also provides strong evidence that she genuinely believed her story when she was telling it.

Dark traits are compatible with genuine moral convictions

Unfortunately, we can’t rely on what someone believes or is working on as a guarantee that they will not behave malevolently. This is despite the fact that, on average, we expect people with high levels of malevolent traits to have less interest in doing good, and at least some of them lack any altruistic preferences.^[13] Counterintuitively, the dark traits are not incompatible with (abstract) prosocial preferences, including a genuine interest in improving the world and even effective altruism in particular.

There are several historical figures who most likely had highly elevated dark traits and nonetheless seemed to genuinely believe in ideologies that were about doing good. Stalin, for instance, likely had elevated dark traits but also repeatedly risked his life and imprisonment to further communist goals.^[14] Hitler was vegetarian for moral reasons and “used vivid and gruesome descriptions of animal suffering and slaughter at the dinner table to try to dissuade his colleagues from eating meat” (“Adolf Hitler and vegetarianism,” n.d.).

Malevolence and effective altruism

There are several examples of people with high levels of malevolent traits becoming involved with the effective altruism community. In fact, some of us became interested in this cause area thanks to our past personal experiences with highly narcissistic and Machiavellian EAs (as well as with other individuals whose genuinely prosocial beliefs made their malevolence harder to recognize at first).^[15]

It is plausible that EA is particularly appealing to communal narcissists (e.g., Gebauer et al., 2012). Like other narcissists, communal narcissists have an inflated sense of importance and need admiration from others. Relevantly to EA, communal narcissism involves a tendency to present and see oneself as caring and altruistic.^[16] EA might appeal to communal narcissists because it promises to enable them to “do more good” and to be more important and altruistic than most people in the world. EA is also plausibly more attractive to those with a healthy confidence in their own abilities.

Research also suggests that (subclinical) psychopathy is positively associated with support for the “instrumental harm” component of utilitarianism (Kahane et al., 2018). This finding is relevant to discussions of EA and malevolence because EA is (by its nature) appealing to utilitarians, and many respondents on past annual EA surveys (e.g., the 2019 survey (Dullaghan, 2019)) have identified themselves as utilitarians.

Of course, even if it turned out that people high in psychopathy and/or (communal) narcissism are somewhat overrepresented among EAs relative to the general population, we’re not claiming that most EAs have high levels of malevolent traits; this is almost certainly wrong. (If anything, most EAs probably have lower malevolent traits than the population average.) We are just pointing out that malevolent individuals can be drawn to EA.

Demonizing people with elevated malevolent traits is counterproductive

In this section, we argue that demonizing people with elevated malevolent traits comes with epistemic problems and other downsides. Of course, we shouldn’t let ourselves be exploited and manipulated by people, and it can help to recognize actors who are more likely to do this. But one can feel compassion for someone while also:

Taking decisive action to prevent that person from causing (further) harm, such as removing them from a position of power or not allowing them to obtain such a position.
Not allowing that person to take advantage of us or others.
Not interpreting their behaviors in an unrealistically charitable way.
Maintaining a realistic view of them and making realistic predictions about their future behavior. This may include:
- Not “trusting” that person—i.e., not making inaccurately optimistic predictions about the person’s behavior or trustworthiness.
- Not having an overly high probability that the person can or will change (e.g., not assuming the person necessarily can or will become less malevolent over time, even if they want to).

Anecdotally, some people may be too trusting and/or too unwilling to “judge” people, perhaps out of a belief that making predictions about someone’s level of malevolence would be unkind or uncompassionate. Perhaps related to this, some people high in malevolent traits report that they have successfully manipulated and “fooled” even their therapists.^[17] But there doesn’t need to be a “trade-off” between compassion and a healthy level of awareness of malevolent traits. It’s important to act decisively to prevent people high in malevolence from causing (further) harm, but we believe that when dealing with such actors, a compassionate, non-judgemental attitude is more productive than demonizing them.

Epistemic reasons not to demonize people

We expect that if people were more understanding and less judgemental of people with elevated malevolent traits, it would be easier to detect such people, as well as better for epistemics overall.

We think that people tend to be too reluctant to consider the hypothesis that someone has elevated levels of malevolent traits. If we are reluctant to entertain such an hypothesis, it seems more likely that we will have “false negatives”—i.e., missed opportunities to identify those high in malevolent traits. On the other hand, if we don’t buy into overly binaristic ideas about malevolence (such as the idea that having any malevolent traits makes someone irredeemable, “bad,” or otherwise unlikeable), this would allow us to think more clearly and probabilistically. For example, we could think that someone has a 30% probability of having high levels of malevolent traits without this representing a permanent and damning judgment of their moral character.

A lack of compassionate attitudes towards malevolent individuals may also increase the risk that all of us are less willing to admit our flaws and non-altruistic motivations out of fear of being labeled as malevolent and being ostracized. Likewise, if we view all dark traits, however minor, as completely unacceptable, we risk deceiving ourselves and rationalizing our own darker motivational tendencies with noble-sounding motives.^[18]

Compassion-based reasons not to demonize people

A second reason for treating those with concerning levels of malevolent traits (and even those who have only mildly elevated levels of them) with compassion is that this seems kinder to the individuals themselves.

As with ~any set of behaviors or traits, dark traits can be explained in terms of genetic and environmental factors (and their interactions).^[19] People with malevolent urges did not “choose” them and may struggle with negative feelings like self-loathing and shame—no need to exacerbate such feelings if it can be avoided. And in many cases, malevolent emotional, cognitive, and behavioral patterns are the result of, or at least accompanied by, non-malevolent motivations and schemas. For example, someone with high levels of malevolence might not only desire status and power, but might also have insecurities, a desire for safety, connection, and positive self-image; they might also be influenced by aspects of their upbringing and/or past experiences, traumas, and/or other factors.^[20]

Humans are messy and complex. It’s plausible that the vast majority of humans have minor malevolent tendencies. And even people with highly elevated dark traits usually also have benevolent parts. In some cases, people with ego-dystonic malevolent traits find it easier to improve their behavior if they are approached with an attitude of support, respect and compassion (by their therapist, for instance). (See also here for an example of a narcissist with insight into his own condition who reports that he has benefited from his nonjudgmental, compassionate therapist.)

Defining malevolence

Any one sentence definition doesn’t do justice to the complexity of concepts in the real world. But broadly speaking, we conceptualize malevolence as a tendency to disvalue (or to fail to value) others’ well-being. Someone’s transient emotional or cognitive state of mind can be called malevolent (i.e., we can talk about “malevolent states”^[21]), but most of the available literature on malevolence in humans is about personality traits. Malevolent traits are also known as “dark traits” or “socially aversive traits.” Malevolent states and traits, especially when displayed by powerful actors, are concerning because of their association with malevolent behaviors (i.e., behaviors that harm or fail to protect others’ well-being).

Defining and measuring specific malevolent traits

This section gives a list-like summary of some of the concepts that involve or relate to malevolence. It is intended to provide a quick overview for those interested in acquainting themselves with some of the existing research on this topic, but if that’s not you, please feel free to skip this section. For information regarding why we think these traits are especially likely to be found among those in positions of power, please see the section on power and malevolence.

The dark tetrad

The dark tetrad refers to the traits of sadism, psychopathy, Machiavellianism, and narcissism. All dark traits, including the dark tetrad traits, tend to positively correlate with each other. We expand on some of the implications of this later. The dark tetrad may be a suboptimal operationalization of malevolence, e.g., because some of the constructs’ origins are historically contingent and their contents are partially overlapping.

Sadism is defined as “the tendency to enjoy causing, or simply observing, others’ suffering” (Paulhus et al., 2020).^[22] Experimental evidence suggests that it matters to sadistic individuals that the other is actually suffering (as opposed to just enjoying aggression or seeing individuals who look like they are suffering).^[23] It has also been suggested that sadistic behavior might be addictive (Baumeister and Campbell, 1999), though this claim appears not to have been studied directly.

The word psychopathy has different meanings depending on the context, but key features^[24] tend to include (1) callousness—a lack of affective (i.e., emotional) empathy^[25] - and related tendencies including a lack of guilt, low fearfulness, high manipulativeness, and lying, and (2) antisocial behavior, impulsiveness/recklessness, and lack of long-term goals (Patrick et al., 2003). The fact that these two sets of features tend to correlate with each other (Vanman et al., 2003) is actually somewhat reassuring from a longtermist perspective, because antisocial and impulsive behaviors do not seem conducive to attaining and retaining positions of power (other things being equal).

Machiavellianism refers to the degree to which an individual has Machiavellian views (a tendency to see humanity as being prone to and deserving of exploitation) and/or Machiavellian tactics (in which an individual strategically uses others to achieve their own ambitious goals without regard for other people’s welfare; Monaghan et al., 2020).^[26] It is a dimensional construct: i.e., it is typically studied as something that exists as a continuum in the population.

Narcissism lacks a universally agreed-upon definition, but it has the core features of self-centeredness and self-importance, or unreasonable psychological entitlement, and exploitativeness (Krizan & Herlache, 2018; Miller et al., 2018; Miller et al., 2016). The term generally refers to a dimensional personality construct, but can also refer to the diagnostic category of Narcissistic Personality Disorder (NPD). Narcissism is typically associated with low affective empathy (Urbonaviciute & Hepper, 2020). Many researchers (e.g. Krizan & Herlache, 2018) distinguish grandiose narcissism^[27] from vulnerable narcissism (characterized by reactivity, low self-esteem, and susceptibility to envy).

Another type of narcissism that has been put forward is called malignant narcissism, which was described (by Kernberg, 1984, cited in Goldner-Vukov & Moore, 2010) as “1) a typical core narcissistic personality disorder (NPD), 2) antisocial [behavior] (ASB), 3) ego-syntonic sadism and 4) a deeply paranoid orientation toward life.” It has been proposed that numerous dictators and tyrants had/have malignant narcissism.^[28]

Other forms of malevolence

Retributivism, vengefulness, and other suffering-conducive tendencies

Retributivism is the idea that it is morally good to punish wrongdoers by inflicting suffering on them (in addition to instrumental reasons such as deterrence or rehabilitation). It is usually understood as a theory of punishment or a moral philosophical view. It can be viewed as a reflective endorsement of and philosophical justification for (certain forms of) vengeful or vindictive emotions or behaviors, but it’s not a trait per se, and could, for example, be endorsed on a purely cognitive or ideological basis. Multiple factors plausibly contribute to retributivism, though, including personality traits.

One psychological trait that seems conceptually relevant here is vengefulness (also known as vindictiveness), which refers to a disposition towards “the infliction of harm in return for [a] perceived wrong” (Stuckless & Goranson, 1992).^[29] Other concerning and plausibly related traits and processes include spitefulness (explained below), moral disengagement (which involves convincing oneself that one is “in the right,” or at least not acting immorally, when one is in fact behaving immorally^[30]), as well as less-studied traits such as dispositional hate (Brogaard 2020).

There are also structural and cultural factors which likely encourage attitudes that include the endorsement of suffering (“suffering-conducive attitudes”). For example, some extremist ideological belief systems view outgroup members as evil and deserving of (severe) punishment. A future post will cover this problem in more detail along with several other long-term risks from fanatical ideologies. In general, cultural/historical factors might cause some people to feel hostility towards particular populations.^[31]

Spitefulness

Spite can be defined as “costly behavior that harms others” (Fulker et al., 2021). Within evolutionary biology, spite has been studied across a range of species, from insects (Gardner et al., 2007) to bacteria (Bhattacharya et al., 2019). In humans, a Spitefulness Scale was developed in 2014 (Marcus et al., 2014).

This concept strongly overlaps with sadism, but it’s also distinct in the following ways:

(1) Spitefulness is (according to this definition) costly to the spiteful actor, while sadism doesn’t have to be.

(2) Spitefulness involves actively harming others, but the type of harm doesn’t necessarily involve suffering and isn’t necessarily pleasurable to the perpetrator, whereas sadism involves taking pleasure from either causing or simply observing suffering.

Spitefulness also overlaps with vengefulness, but again differs from it in meaningful ways:

(1) Spitefulness is costly, while vengefulness doesn’t have to be.

(2) Although both spiteful and vengeful behaviors involve a tendency to inflict harm on others, spiteful behaviors don’t necessarily involve the perpetrator using a moral (or any other) justification for their spiteful behaviors.

The Dark Factor (D)

Almost all traits of human malevolence correlate substantially with each other. This suggests that there exists a general factor of human malevolence—analogous to g, the general factor of intelligence. Moshagen et al. (2018) refer to this as the Dark Factor of Personality (or “D” for short), and they’ve created several measures of it (of different lengths, e.g., D16 is a 16-item scale for it). You can take the survey for free at the Dark Factor website.

The researchers behind the Dark Factor argue that various individual malevolent traits could be viewed as specific expressions of this general “tendency to ruthlessly pursue one’s own interests, even when this harms others (or even for the sake of harming others), while having beliefs that justify these behaviors.”

One limitation is that D doesn’t fully capture any individual dark trait.^[32] D shouldn’t be understood as exhaustively capturing the nature and themes of all dark traits. See also the appendices for more details on D. We believe it’s therefore important to also examine individual dark traits in detail—not least because there is often more data available.

Methodological problems associated with measuring dark traits

Research on dark traits is difficult and sometimes motivated by concerns and assumptions that are less relevant from a longtermist perspective. Good data is generally sparse.

Firstly, the dark traits that are most concerning from a longtermist perspective will not necessarily correlate with the characteristics that are the most concerning from a clinical perspective (i.e., characteristics that cause people to seek professional support) or a criminal perspective (i.e., characteristics that correlate with people being convicted of crimes).^[33] Secondly, most clinical diagnoses and measurement instruments involve (more or less) arbitrary cut-off points anyway. (For example, they sometimes differ between countries.)

Perhaps most importantly, there are currently no measures of malevolence that are manipulation-proof, especially for intelligent and well-adapted individuals. A lot of the measurements rely on self-report or interviews, and many of the most worrisome individuals (from a longtermist perspective) are presumably both motivated to avoid being identified as malevolent and good at hiding their malevolence.

Social desirability and self-deception

For reasons outlined in the appendices, we think that self-reported levels of dark traits are usually underestimates. Self-report surveys can be gamed, and those who game them in high-stakes or hiring contexts are plausibly more concerning than those who don’t, which makes self-report surveys pretty useless for actually identifying the most malevolent individuals in high-stakes contexts. Even for respondents who aren’t consciously gaming a survey, self-deception is a huge problem (please see the appendices for details).

How common are malevolent humans (in positions of power)?

What percentage of people in relevant positions of power have concerning levels of malevolent traits?^[34] There are, of course, different ways of trying to answer this question. Here are two obvious approaches (which aren’t mutually exclusive):

Try to directly assess the levels of malevolence among those in positions of power.^[35]
Start by examining the distributions of traits of concern in a broader reference class of people (e.g., in the general population), then make predictions based on the information available combined with informed predictions about how the population of interest might differ from that broader reference class—such as whether individuals with malevolent traits are more motivated or skilled at attaining and retaining positions of power.^[36]

Both approaches require us to make probabilistic estimates. They both also come with specific challenges discussed below.

Approach (A) is more direct and action-guiding, and is useful to the extent that we actually have access to information about the behaviors and traits of the powerful people in question. There are some figures about whom there is enough information to make inferences about their traits. For example, many have discussed Donald Trump’s malevolent traits. But there are many powerful people about whom we simply don’t have much information available.

Another application of Approach (A) would be in the context of your life, like when you make actual decisions about whom to vote or work for, for example.

The following section takes Approach (B), partly because it doesn’t involve discussing the malevolent traits of specific individuals, which is costly for obvious political and social reasons.^[37]

With this context in mind, let’s examine the distribution of malevolent traits in the general population.

Things may be very different outside of (Western) democracies

The information in the following sections is mostly based on (Western) democracies. This is worth keeping in mind, particularly because there might be environments that select so strongly for malevolence that the prevalence barely matters. For example, if we condition on being in a leadership position in North Korea or another autocracy, different data and considerations likely apply.

Prevalence data for psychopathy and narcissistic personality disorder

Both psychopathy and narcissistic personality disorder (NPD) have been studied often enough as categorical disorders that there are reasonable estimates of their prevalence in the general population. Those who reach the threshold for a clinical disorder are typically less likely to be in the class of individuals about whom we are most concerned, because they are also less likely to attain or maintain positions of power in the first place. However, due to the likely continuum/dimensional nature of these traits, one could argue that the prevalence of clinical disorders can give clues as to how common undiagnosed and/or subclinical (but potentially x-risk-or-s-risk-increasing) levels of these traits may be (Lahey, Tiemeier, & Krueger, 2022).

Psychopathy prevalence

Sanz-García and colleagues (2021) estimated the combined prevalence rate of psychopathy in the general population based on a meta-analysis of 16 samples of adults with a total sample size of 11,497 people across various Western countries.^[38] Taking into consideration the definitions of psychopathy across all instruments included in the meta-analysis the prevalence of psychopathy in the general adult population is about 4.5% (95% CI: [1.6%, 7.9%]).

Such prevalence rates, while they may seem surprisingly high, could be underestimates, since they are based on self-report ratings and since psychopathic traits are socially undesirable. Sanz-García and colleagues also found that the prevalence was substantially higher − 12.9% - among samples taken from certain professional groups (e.g., managers, executives, and other professionals).^[39] Since a couple of the professional groups contributing to that prevalence estimate (managers and executives) tend to have some power over subordinates, a higher prevalence rate of psychopathy among these groups is unsurprising, for reasons explained later.

The prevalence rates depend on the instrument used. When psychopathy is defined using the Psychopathy Checklist Revised (PCL-R), the prevalence rate among the general adult population is 1.2% (95% CI: [0%, 3.7%]), but this tool has been criticized for placing too much emphasis on (confirmed) criminal behavior as part of the definition of psychopathy (Minkel 2010); since we expect that the individuals most likely to negatively impact the long-term future are those who do not get convicted of criminal behavior, the prevalence rate using PCL-R is less relevant for our risk assessment purposes. The prevalence rate based on all instruments (4.5%, as mentioned previously) is more relevant to the long-term future.

For more details, please see the appendices.

Narcissistic personality disorder prevalence

Trull and colleagues (2010) estimated the prevalence of Narcissistic Personality Disorder (NPD) among 43,093 U.S. adults based on the Diagnostic and Statistical Manual IV (DSM-IV) criteria for NPD, shown below.^[40] If one defines NPD as requiring at least five of the nine criteria, with at least one of those symptoms causing social or occupational dysfunction, then the prevalence of NPD in that sample was 6.2%. The prevalence would have been even higher if there wasn’t a requirement for at least one symptom to cause social or occupational dysfunction. This is noteworthy because individuals who exhibit symptoms consistent with NPD but who are not distressed or impaired by any of their symptoms are even more concerning from a longtermist perspective than narcissistic individuals who are distressed or impaired by one or more of their symptoms.

A rate of 6.2% should arguably be “visible to the naked eye,” i.e., we should be able to observe it in our social circles and wider society. If the 6.2% “feels high” to you, there are several potential explanations. Your immediate response might be that this estimate is inaccurate and/or that narcissism often doesn’t cause overt damage; if so, however, we would suggest that there are potentially (also) other explanations: you might have either selected against narcissistic individuals in your environment (perhaps unconsciously) via your social, occupational, or other choices, and/or you might have trouble recognizing narcissism.

Figure 1: The symptoms of narcissistic personality disorder according to the Diagnostic and Statistical Manual IV (DSM-IV) criteria for narcissistic personality disorder (NPD), apart from the general criteria for personality disorders^[41] which also cover the need to rule out alternative explanations, assess whether the symptoms are causing distress, and so forth. Note that the current DSM is DSM-V, but the DSM-IV criteria are shown here because that is what the prevalence estimate from Trull et al. (2010) was based upon. The symptoms of NPD were not significantly updated between those two editions. Trull et al. (2010) found that 6.2% of the U.S. population were experiencing a total of at least five of the nine symptoms listed above along with social or occupational dysfunction in relation to at least one of those symptoms.

The above sample from Trull and colleagues, which originally came from the National Epidemiological Survey on Alcohol and Related Conditions (NESARC), was the largest NPD-related sample included in a more recent meta-analysis by Winsper and colleagues (2020).^[42] Notably, however, when Winsper et al. (2020) extracted a prevalence rate from Trull et al.’s study (2010), they used a very conservative prevalence rate estimate of 1.0%, using very strict requirements - (1) respondents had to meet the required number of DSM-IV symptoms for NPD, and (2) all symptoms counting toward the diagnosis had to be associated with distress or impairment (part of a so-called “alternative method (NESARC-REVISED)” for diagnosing personality disorders). For this reason, we think the prevalence estimate included in the meta-analytic estimate is a substantial underestimate of the number of individuals meeting criteria for NPD. Having said this, Winsper et al. (2020) came to an NPD prevalence estimate of NPD of 1.9% (95% Confidence Interval: [0.1%, 5.6%]).

Some papers indicate that it is not rare for narcissists to be found in powerful corporate positions and that narcissists are more often politically active.^[43] Psychological experts tend to rate U.S. presidents as higher in narcissism than the general population, and there appears to be an association between grandiose narcissism and perceived “presidential greatness” (Watts et al., 2013).

The distribution of the dark factor + selected findings from thousands of responses to malevolence-related survey items

The largest publicly-available dataset on the Dark Factor^[44] includes data from over 37,000 people (from >30 countries) who completed the 16-item version of the Dark Factor survey (“D16”). We refer to this dataset repeatedly in this section (calling it “the largest public D16 dataset”), to give the reader an idea of how commonly people in the general population endorse statements that are sadistic, callous, Machiavellianism, or spiteful. Please note that examining the responses provided to single survey items is (of course) different to examining the distribution of scores on validated scales designed specifically for measuring a given personality trait. The items we quote below are part of a scale designed to measure the Dark Factor (D), not the individual dark traits.

Items in the D16 dataset were answered on a five-point Likert scale ranging from 1 (= strongly disagree) to 5 (= strongly agree), translated to the respondent’s native language (where applicable). To keep our descriptions in this piece maximally simple, we report the raw percentages of respondents who endorsed specific statements. However, you can also re-analyze the responses by weighting them according to the demographic representativeness of each respondent in each country. For the relevant weights associated with each respondent, please see the table on the relevant OSF page (Moshagen et al., 2024).

Sadistic preferences: over 16% of people agree or strongly agree that they “would like to make some people suffer even if it meant that I would go to hell with them”

In the largest public D16 dataset, the item that’s most directly relevant to sadism is: “I would like to make some people suffer, even if it meant that I would go to hell with them.”^[45] (This is also an example of costly (or spiteful) sadism, since the actor would be undergoing a cost in this scenario.) Over 6.4% strongly agreed with this statement and a further 10.3% agreed. That totals to nearly 17% of people agreeing or strongly agreeing with this statement.

Even if some (e.g., ~4%) of these respondents were not sincere,^[46] there seem to be more reasons for people to under-report their agreement than for people to over-report it. Overall, the cumulative percentage of over 16.7% of people agreeing or strongly agreeing that “I would like to make some people suffer, even if it meant that I would go to hell with them” suggests that a concerningly large minority of people have sadistic preferences with respect to at least some people.

Please see the appendices for more on studies on sadistic traits. We would like to see more and better data on this, especially behavioral data and more qualitative research (e.g., how do survey respondents understand various items, what exactly do they find motivating about hurting people, how often do they think about this, and so on).

Agreement with statements that reflect callousness: Over 10% of people disagree or strongly disagree that hurting others would make them very uncomfortable

In the largest public D16 dataset, one of the statements is that “Hurting people would make me very uncomfortable.” This item, although it is labeled as being about sadism, is actually most relevant to callousness, since it is referring to the absence of discomfort in the context of hurting someone (rather than the derivation of pleasure from doing so). This would include a lack of emotional empathy in response to suffering. A total of 3.6% of respondents strongly disagreed with the statement, and a further 8.2% said that they disagreed.

Another item in the D16 that taps into callousness is: “It is hard for me to see someone suffering.” About 3.8% of respondents strongly disagreed that it is hard for them to see someone suffering and a further 7.6% said that they disagreed with the statement. (This item is labeled as being about “Crudelia.”)

Endorsement of Machiavellian tactics: Almost 15% of people report a Machiavellian approach to using information against people

In the largest public D16 dataset, there are two items that primarily reflect Machiavellianism. One of these is: “It’s wise to keep track of information that you can use against people later.” 14.5% of respondents strongly agreed with this statement. For “Most people deserve respect” (the other most relevant item), 3.7% strongly disagreed with this idea.

Agreement with spiteful statements: Over 20% of people agree or strongly agree that they would take a punch to ensure someone they don’t like receives two punches

In the largest public D16 dataset, there’s an item that specifically reflects spitefulness: “I would be willing to take a punch if it meant that someone I did not like would receive two punches.” Presented with this statement, 7.7% of people strongly agreed with it, and a further 13.4% said that they agree. (So, in total, more than a fifth of people agree or strongly agree with this statement.)

A substantial minority report that they “take revenge” in response to a “serious wrong”

The Socio-Economic Panel (SOEP) includes a question particularly relevant to vengeful (retributivist) attitudes: “I take revenge if I suffer a serious wrong.” Over the years that this survey has been administered (2005, 2010, and 2015-2021), a substantial minority have agreed or strongly agreed with this statement. For example, on a scale from 1 (“Trifft überhaupt nicht zu”—“Not true at all”) to 7 (“Trifft voll zu”—“Totally true”), over ~12% gave a response indicating agreement (i.e., a 5, 6 or 7 on the 7-point scale) to this statement in 2021.

The distribution of Dark Factor scores among 2M+ people

As mentioned earlier, Moshagen and colleagues created a website about the Dark Factor. On it, you can take the D16 or the D70 and receive your own score, alongside a histogram and feedback.^[47] The participants in these surveys are not representative of the general population, due to self-selection effects, but plausibly they are more representative than most samples used in psychology research.^[48]

In the D16 and D70, a score of 5 for a given item means that a participant either selected that they ‘strongly agree’ with a positively-coded statement (a statement consistent with higher levels of the Dark Factor) or selected that they ‘strongly disagree’ with a reverse-coded statement (consistent with lower levels of the Dark Factor).

If someone answered the survey such that they scored 4 on every item (i.e., agreeing [giving a 4 on a 1-5 scale] with ‘dark’ statements and disagreeing [giving a 2 on a 1-5 scale] with ‘light’ statements), they would be presented with the histogram shown in Figure 3, and they would be 97th percentile on the Dark Factor.^[49] To put this another way, about 3% of people have a score at least as extreme as this.

Figure 2: The D16 (in randomized order), with the score for every item set to 4 (out of a maximum of 5). This means that this hypothetical participant agrees with all statements that indicate dark traits (e.g. “People who mess with me always regret it”) and disagrees with statements suggesting that they lack dark traits (or have countervailing ‘light’ traits)(e.g. “It is hard for me to see someone suffering.”).

Figure 3: The histogram of results from people who have taken the Dark Factor survey via the DarkFactor.org. The bar highlighted in red shows the score that a participant would receive if they answered the questions as shown in the previous figure. Someone with a score of 4.0 would be 97th percentile on the Dark Factor. To see the results summary as such a participant would see it, you can click here.

Reasons to think that malevolence could correlate with attaining and retaining positions of power

From a longtermist perspective, we’re most concerned about actors who have high levels of malevolent traits combined with a substantial desire and ability to attain and retain positions of power. Based on historical examples, it has not been uncommon for malevolent actors to rise to power so far. In the section that follows, we discuss current evidence and theoretical reasons for expecting this to continue to occur.

The role of environmental factors

Environmental factors can affect the interaction between malevolence and the attainment and retention of power. For example, some studies have also found positive associations between dark traits and mental toughness (a “personality construct that enables individuals to thrive in stressful environments, persist in their goals, and maintain confidence in adversity”, Liang et al., 2024).

In the context of political leadership, traits such as ambition, ruthlessness, and risk tolerance may be more strongly advantageous or adaptive during unstable and chaotic times; and conversely, dark traits in leaders may contribute to increases in political polarization, which could lead to positive feedback loops. Nai and Maier (2023) discuss the potential interactions between dark traits and political environments in detail. In their chapter on dark politicians, populism, and political campaigns, they cite evidence suggesting that extraversion and narcissism may be particularly helpful during “turbulent times or in highly competitive situations,” and that subclinical psychopathy are more likely to be successful in “socially competitive” environments, including in politics. See also Colgan (2013, p. 662-665): Colgan argues that revolutionary politics selects for ambitious, ruthless, and risk-tolerant traits. Such trends would be concerning to the extent that we expect the future in general, or in the period around the development of TAI in particular, to be more unstable and chaotic than usual.

D-scores among politicians

Maier et al. (2022) assessed German State Parliament Candidates via a 6-item, self-report measure of D. The D-scores of these politicians were about 0.5-0.7 scale points higher than those of German students, which is substantial on a scale ranging from 1 to 5 (Bader et al., 2021).^[50] See the appendices for further evidence in support of the hypothesis that politicians demonstrate higher levels of dark traits (on average), and our own tentative estimates regarding the Dark Factor scores of people most likely to influence the long-term future.

In the absence of much other data on the distribution of dark trait scores among those in positions of leadership, we sketch some considerations (mostly based on first-principles reasoning) regarding why we should or should not expect malevolent individuals to be in positions of power. We’ll examine the potential relationships between dark traits and actors’ i) motivation and ii) ability to gain power. (We discuss the traits separately, but many of the factors discussed below tend to correlate with each other.)

Motivation to attain power

Many dark traits appear to correlate with a higher motivation to attain power. For example, Houston (2019) found that all of the dark tetrad traits (especially narcissism and psychopathy) were positively associated with motivations to attain power.

Early studies associate sadism with a desire for dominance, subjugation, and status-seeking (O’Meara et al., 2011) though this may be driven by sadism’s correlation with the other dark traits (Southard et al., 2015; Jonason & Zeigler-Hill, 2018), and/or it may be driven by the specific definition/operationalization of sadism.
Narcissism likely increases motivation to attain power. Those with high levels of narcissism often strive for uniqueness, supremacy, and have grandiose fantasies. O’Reilly and Pfeffer (2021) provide evidence suggesting that those with higher levels of narcissistic traits are more likely to see organizations in political terms, more willing to engage in organizational politics, and tend to consider themselves to be more skilled political actors.^[51]
The callous-interpersonal factor of psychopathy is likely linked to an increased motivation to attain power. It is thought to be associated with feelings of grandiosity.
Machiavellianism likely increases motivation to attain power. It is not explicitly defined by a desire for power and domination, but it is defined by high ambition. Empirically, Machiavellianism has been associated with ‘extrinsic aspirations,’ i.e., wealth, fame, and image.^[52]

Ability to attain power

Some dark traits tend to be associated with maladaptive traits, behaviors, or disorders, which might reduce the probability that people with these dark traits will attain positions of power. For example, psychopathy (especially the antisocial-impulsive factor) is associated with imprisonment, psychiatric admissions, and being homeless (Coid et al., 2019). Narcissistic Personality Disorder (NPD) is associated with other disorders such as alcoholism and depression (which can be caused by delusional feelings of grandeur combined with an inability to cope with failure, criticism, or other threats to self-esteem, resulting in so-called “narcissistic injury” (Green & Charles, 2019).

However, outside of prison and clinical populations, we expect that many malevolent individuals are able to exhibit (at least somewhat) adaptive behavior, and some people with higher levels of malevolence probably have a higher ability to attain power compared to those with lower levels of malevolence (perhaps especially during times of instability and chaos, which we might expect before TAI).

Below, we list some reasons to expect malevolence to positively correlate with the ability to attain power (outside of clinical and prison populations). Please also see Chapter 1 from Whitehead (2024), which provides a more detailed overview of many of these (and other) considerations.

As mentioned earlier, a number of historical dictators have been likely to be malignant narcissists, and sadism is acknowledged as a “factor” of this. The relatively high number of such figures in history serves as some evidence of sadists (at least sometimes) rising to power (George & Short, 2018).^[53]
Some forms of impulsiveness are not necessarily detrimental to success, and might be adaptive in certain environments (Aharoni & Kiehl 2013). This is expanded upon in the appendices.
Those who aren’t distressed by their symptoms are less likely to be hindered by them. At least in the case of NPD symptoms, as described earlier, it seems more common for someone to not be distressed by a given symptom than to be distressed by it.
People with higher levels of dark traits may be less concerned with commonsensical moral constraints (e.g., people relatively high in psychopathy have been found to exhibit modest differences in moral reasoning compared to those lower in psychopathy (Marshall, Watts, & Lilienfeld, 2018). Moshagen et al. (2020) found that the Dark Factor (D) is significantly positively correlated with competitive worldviews (seeing the world/life as a “ruthless, amoral struggle for resources and power”). Zeigler-Hill et al. (2020) drew similar conclusions. Such views could plausibly widen the range of behaviors that people high in malevolent traits consider possible in their pursuit of power.
Narcissism is associated with self-enhancement (Grijalva et al., 2015), and self-enhancement, in turn, tends to be associated with positive social evaluations by others, at least upon first meeting them (Dufner et al., 2018). People with sufficiently high levels of narcissism (especially grandiose narcissism) are more likely to be extraverted,^[54] socially dominant, at least initially perceived as charismatic and attractive.^[55] These factors plausibly increase their ability to obtain positions of power. However, narcissists can fall out of favor, if and when others catch on to their narcissistic traits. They often care so much about affirmation that they seek affirmation in the short-term even if it damages their ability to get affirmation in the long-term, e.g., by lying about achievements (Vazire & Funder, 2006). Overall, though, it seems they have at least some success in attaining positions of power over others; one fairly large longitudinal study (n = 1,526) found a weak positive relationship between having the responsibility of supervising others in the workplace and levels of narcissism.^[56] However, the results are based on self-report, which complicates our interpretation.^[57]
Machiavellianism is associated with forming and carrying out long-term plans to achieve (often ambitious) goals; people with high levels of this trait have been described as “strategic and adaptable” (Whitehead, 2024). It is also associated with a willingness to manipulate others.
The callous-interpersonal factor of psychopathy is associated with interpersonal manipulation, fearless dominance, superficial charm, and fewer commonsensical moral constraints on the person’s available actions. Other things being equal, we’d expect this to increase one’s success at attaining power. Sensation-seeking and a higher-than-average risk appetite could also be adaptive in some situations, depending on the levels of these traits and the individual’s context. One study (Babiak et al., 2010) found that psychopaths demonstrated relatively poor people management and team-playing, and they received negative reviews from subordinates, but they had advantages in other areas (e.g., good communication skills, strategic thinking, sometimes high appraisals by supervisors). See also Whitehead (2024).

Retention of power

In addition to the degree to which people high in malevolence seek and attain positions of power, another important factor (contributing to the overall prevalence of malevolence in powerful actors) is the degree to which they’re likely to stay in those positions. Unfortunately, though there isn’t much evidence directly addressing this issue, malevolent traits have so far been found to be associated with career success, both in general (e.g., in terms of firm internationalization^[58]), in entrepreneurial careers, and in political careers (Hirschi & Jaensch, 2015; Nooshabadi, Mockaitis, & Chugh, 2024; Gubik & Vörös, 2023; Nai, 2019b). Furthermore, an individual’s levels of grandiose narcissism and boldness/dominance tend to be positively associated with their psychological well-being,^[59] which could, in turn, plausibly assist with the maintenance of their positions of power (Blasco‐Belled et al., 2024).

In summary, it seems that malevolence is common enough—including among those more likely to influence the long-term future (namely those who seek, attain and retain positions of power) - to be a cause for concern for those wishing to reduce x-risks and s-risks. Historical examples point to similar concerns, as do theoretical considerations.

Potential research questions and how to help

We suggest potential questions and lines of work below. If you want to work on any of them (or have other related research ideas), please feel free to get in touch.

Are there high-leverage interventions to reduce the influence of malevolent actors?

Decentralization of power (Work, 2002) and other structural reforms (to democracies and organizations) designed to reduce the expected negative impacts of people with high levels of malevolent traits.^[60]
More education on how to detect and how to respond to malevolent traits.
- For example, consider refraining from promoting or bolstering someone’s career if they have high levels of malevolent traits.
Better background checks when considering hiring someone for or promoting someone to a high-stakes position (the higher the stakes, the more justified the use of somewhat more invasive and/or costly assessments of malevolence).
Incentivizing whistleblowing:
- Structural changes such as setting up institutions and structures to receive concerns.
- Improving individual incentives in favor of whistleblowing, such as setting up prizes or committing to funding legal costs.
Perhaps this could be specifically designed to encourage more behavior like that of Daniel Kokotajlo and others.^[61] Of course, this would need to be done carefully in order to not elicit false/untruthful whistleblowing, and would need to be accompanied by appropriate responses to the whistleblowing.
Investigations into individuals of concern, for example:
- In response to whistleblowing within an organization (the investigations could be done within and/or by people external to the organization)
- Proactive investigative journalism (which could include work like Kelsey Piper’s work on OpenAI^[62]). For example, investigative journalists could focus on people who look like they might gain a (very) significant amount of power, whether or not they seem to have high levels of malevolence, and/or they could also be helpful in situations where an already high-profile or public figure is suspected of having high levels of malevolence.
Providing support^[63] for individuals and organizations who suspect that they may be dealing/working with someone with elevated malevolent traits/behaviors
What other interventions (if any) are worth investigating further?^[64]

How does malevolence relate to power-seeking and the successful attainment and retention of power or influence?

What are effective ways of preventing malevolent individuals from gaining power or reducing their influence?
- For example, are there effective ways of establishing more oversight, surveillance, or checks and balances designed to reduce malevolent behavior, without generating unjustified new risks (Klaas, 2021, ch. 12; Bostrom, 2019)?
In what ways does malevolence tend to relate to an actor’s motivation and ability to attain and retain positions of power?
- In particular, to what extent should we expect that malevolent actors will attain and retain positions of power in society that will disproportionately influence the development or use of transformative AI (TAI)?
Do some people prefer leaders to have malevolent traits in some situations? For example, politicians with elevated malevolent traits may seem like bold strongmen who are needed in times of crisis.
- Which groups tend to select for malevolent traits?
- What environmental features or historical events tend to increase people’s preferences for leaders with malevolent traits?
How likely are we to observe stronger malevolent tendencies among powerful actors in the future?
- For example, if power is more concentrated among a smaller number of actors (perhaps due to TAI) in the future, to what extent should we expect this to be associated with those few powerful actors demonstrating higher levels of malevolence (relative to today’s powerful actors)?
- To what extent does power increase actors’ tendencies to engage in malevolent behavior?
To what extent should we expect TAI to demonstrate greater propensities towards malevolent behaviors in the context of a distributional shift?
- Which type of distributional shift(s) can we expect in the future?
Can we expect something equivalent to a treacherous turn from malevolent people? What does this say about our ability to rely on earlier indicators of non-malevolence?

Can an understanding of human malevolence inform efforts to prevent or reduce malevolent-like dispositions in AIs? If so, how?

Is understanding human malevolence only useful for potentially improving the safety of AI policy and institutional decision-making, or is it also useful for technical AI safety research (questions about which are covered in the rest of this section)?
Can understanding human malevolence help us identify the extent to which different factors in AI training environments select for (or against) malevolent dispositions or behaviors?
- To what extent can we make inferences about this topic based on:
  - The evolution of human malevolent traits?
  - The neurodevelopmental trajectories of these traits (across a single human lifespan)?
- Can these lines of research provide ideas for points at which to intervene in order to prevent or reduce the development of malevolent-like behaviors in AIs?
To what extent should we expect malevolent-like behaviors to arise in AIs due to processes that don’t have human analogues?
- To what extent should we expect sign flips to give rise to malevolent-like behaviors—for example, due to an inverted reward signal during reinforcement learning with human feedback (RLHF) (Ziegler et al., 2019) or due to an inverted steering vector in activation engineering (H/T Timothy Chan)? Does this change the expected value of some of the potential interventions for reducing malevolent-like behaviors in AIs?
Can constitutional AI (Bai et al., 2022) be designed in such a way that the AI is particularly unlikely to give rise to malevolent-like dispositions or behavior? How can an understanding of human malevolence inform our approach to this question?
To what extent can we describe AIs as having “personalities”?^[65]
- If they do have personalities, how can we best reduce the probability that they exhibit malevolent tendencies? (Are any of the approaches outlined in earlier bullet points promising, and/or are there more promising lines of investigation that we haven’t considered?)
- If a given AI can have more than one “personality” profile (e.g., as suggested in Kovač et al., 2023, Shanahan, McDonell, & Reynolds, 2023, and janus’ ‘Simulators’), does this in any way relate to (a more extreme version of) multiplicitous personalities (or “multi-agent” models of the mind) in humans (e.g., can we learn from the ways in which context, values, and personality appear to interact in humans)?
To what extent should we expect to see LLMs’ “personas” demonstrate human-like correlations between different malevolent traits, or between malevolent traits and other traits? In other words, if there are robust moderate correlations between a given trait and some other variable X among humans in psychological studies to date, to what extent should we expect to observe such a correlation among LLM “personas”?
To what extent would a more detailed understanding of human malevolence help us answer the questions in this list?
To what extent does human feedback (such as in the context of RLHF) select for (or against) malevolent LLM output?
- Relatedly, (to what extent) should we be concerned about malevolent traits in RLHF raters? Are there effective interventions for reducing any negative impacts we’d expect to arise from RLHF raters with high levels of malevolent traits?
Can our understanding of human malevolence be used in our operationalization and measurement of malevolence or malevolent-like behaviors in LLMs?
- For example, can we build on research such as Perez et al. (2022) and Pan et al. (2023)?
- Is it possible to create even better “evals” for malevolent dispositions or malevolent-like behaviors in AIs and if so, how?
- Are there non-malevolent traits that we can create evals for that would be expected to correlate strongly with malevolent-like behaviors in LLMs? Would these be less likely to be “gamed” or less likely to induce “sycophantic” responses?
- To what extent should we be concerned about the possibility of deceptive alignment interfering with our ability to detect malevolence?

Are there ways in which the long-term future could be positively impacted by a better understanding of human malevolence (apart from through its influence on the trajectory of TAI)?

Could users’ input into LLMs be used to screen them for malevolent traits and to identify those who should not continue to have access to those LLMs?
To what extent do people voting in elections tend to select for (or against) malevolent traits when considering who they prefer to vote into positions of power? To what extent do people making hiring decisions in organizations likely to influence the long-term future select for or against malevolent traits?

Which factors reduce or increase malevolent traits or behaviors?

To what extent can we prevent high levels of malevolent traits developing in the first place, for example, via changes in parenting or education?
To what extent can we reduce the levels of malevolent traits in individuals who already have high levels of these traits, for example, via psychotherapy^[66]?

Which dark traits are the most concerning with respect to existential risks and suffering risks, what are they associated with, and what is their motivational structure? Relevant subquestions:

What’s the motivational structure of different malevolent actors? What costs and levels of risk are they willing to bear in order to act on their malevolent motivations? In which ways are (EA-style) altruistic concerns and malevolent motivations (in)compatible and how do they relate to each other?
To what extent would powerful actors be expected to reflectively endorse their malevolent desires (and would this vary depending on the length and depth of their reflection)? Are some malevolent urges more likely to be reflectively endorsed than others? Which situations or belief systems make people more likely to reflectively (dis)endorse their malevolent preferences?
What’s the underlying motivation for sadism? To what extent are sadistic behaviors addictive and what does that imply? To what extent would powerful actors with high levels of trait sadism be motivated to create suffering that they wouldn’t or couldn’t observe?
How common are the various comorbidities of the dark traits? What does this tell us about their ability to cause harm?

How much information can we derive (both now and as technology develops) regarding the risk that an individual is malevolent based on objective/non-gameable measures? Relevant subquestions:

How much information can we derive about an individual’s predisposition towards malevolent behaviors based on the potential measures listed here? To what extent do each of those measures meet our suggested criteria for an acceptable measure of malevolence? How will the information we derive from these measures evolve after TAI is developed?
Will it be possible to use TAI to develop better methods for detecting malevolence? If so, how can we most effectively prepare to do this quickly once it becomes possible?
To what extent is it feasible to identify individuals (of different ages) with concerning levels of malevolent traits by investigating their life history up to that point?
If/once we discover measures that enable us to accurately assess the risk that an individual has concerningly high levels of malevolent traits, to what extent will it be socially acceptable and politically feasible to use such measures in the context of hiring and/or promotion-related decisions in different industries/contexts?

How (un)skilled are people at detecting dark traits in others (and in themselves)? Relevant subquestions:

How good are lay people at (a) intuitively identifying individuals with high levels of malevolent traits and (b) protecting themselves and others from harm from such individuals, and how much interindividual variation is there in these abilities? One hypothesis would be that those with low levels of malevolent traits tend to be less skilled at detecting malevolent traits in others, partly due to the false consensus effect or the typical mind fallacy.
Can people be trained to become better at detecting dark traits in others? Do people become better with age and experience? To what extent can some people be described as (and to what extent can people learn to become) character superforecasters?
To what extent do self-report ratings of dark traits concur with ratings by others (informant ratings)? (Some existing research was mentioned earlier, but further research in this area could be useful.)
How large is the discrepancy between self-image and actual dispositions (self-deception)? Are people more prone to malevolent behavior than they think? Unfortunately, people are often more motivated by selfish motivations than they believe (e.g., Kurzban, 2012; Hanson & Simler, 2018). (Some existing research on this topic in the context of malevolence specifically was mentioned earlier, but further research in this area could be useful.)

What’s the distribution of the dark traits in different domains? Relevant subquestions:

How are malevolent traits (especially those that are most concerning from a longtermist perspective) distributed among the general population and among relevant subpopulations (e.g., politicians, AI developers and researchers, leaders of influential organizations, RLHF raters)
How common is it for different subtraits within the dark tetrad to exist in isolation from each other (within the general population and within subpopulations of particular relevance to the long-term future)? For example, how common is it for individuals with high levels of callousness to have low levels of impulsivity? How common are the different subtraits of the dark traits? Is there a significant population of individuals with high levels of callous-interpersonal psychopathy but low levels of antisocial-impulsive traits, and to what extent does this affect our interpretation of psychopathy prevalence data?

How often, how readily/quickly, and under what circumstances do people tend to switch from wanting to protect the wellbeing of a specific individual or group to wanting to harm them? Further subquestions and observations:

How readily would a powerful actor’s strong feelings of empathy or love for someone or for a group turn into similarly strong feelings of hate? This might be a more common occurrence among people with high levels of narcissistic, psychopathic, and/or borderline traits, such as when they switch from idolizing to de-valuing someone.
How readily should we expect powerful actors to change who they consider to be an “outgroup” member? Tribalism is a hallmark of human nature (Clark et al., 2019) and ideological fanaticism could be seen as excessive tribalism. Tribalism often gives rise to antagonistic us vs. them dynamics and outgroup hatred. This may drive “ordinary” people to exhibit malevolent preferences and behavior. For example, most people seem to disvalue the suffering of their ingroup but some may value the suffering of (certain) outgroup members (e.g., those belonging to a different “tribe” or believing in a different ideology).
Related to the above, to what extent should we be concerned about powerful human actors experiencing the equivalents of “near misses” (by which we mean cases where an actor who seems to be promoting certain [prosocial] values ends up making the future worse^[67])?

In what other ways should we expect human factors other than malevolence to increase x-risks and/or s-risks?

To what extent should we be concerned about other “inner existential risks”, such as ideological fanaticism? (Some of us are working on this topic.)
In which ways do these other factors relate to malevolence (if at all)?

Other relevant research agendas

Carter Allen’s recent post on psychology and AI research questions includes several sections that overlap with the topics covered in this post, including sections 4.2, 4.3, and part of 4.4. Relevant questions in other sections include questions (listed here in no particular order) about presidential candidates, shard theory, and case studies pertaining to the psychology of individuals who are already or are expected to play a key role influencing the long-term future.
The Psychology for Effectively Improving the Future research agenda
Michael Aird’s open research questions directory
Holden Karnofsky’s list of actionable research questions

Author contributions

The order of the first author was randomized as neither David nor Chi wanted to be the first author. All three authors are core contributors to this post. (David and Chi wanted Clare to be the first author but she declined.) This article started as a project of Chi’s in late 2020. She enlisted two research assistants (one of whom was Clare) to briefly explore the literature but then decided to move on to other topics. In mid-2023, David decided to finish and publish the post (and Clare started helping in ~November 2023) in part because recent historical events provided further evidence that malevolent actors can have an outsized historical influence (and that their psychology is often not well understood), and in part to just get the information out there and to encourage others to work on it.

Acknowledgments

We are very grateful to Amber Dawn for her informative feedback in addition to her help with editing and citations. The other people listed in this section are ordered alphabetically (according to first name) and do not necessarily endorse the quality of this post or any of the specific claims. For comments on an earlier draft, we would like to thank Nicholas Goldowsky-Dill, Paul Knott, and Stefan Torges. For valuable comments on sections of the more recent drafts (or in some cases the whole draft), we would like to thank Catherine Low, Chana Messinger, Kenneth Diao, Lucius Caviola, Mia Taylor, Oscar Delaney, Timothy Chan, Vanessa Sarre, and Winston Oswald-Drummond.

Appendices

For those interested, here is a document containing further information.

References

Note: some references here are cited in the appendices.

Adolf Hitler and vegetarianism. (n.d). In Wikipedia. https://en.m.wikipedia.org/wiki/Adolf_Hitler_and_vegetarianism

Adomako, S., Opoku, R. A., & Frimpong, K. (2017). The moderating influence of competitive intensity on the relationship between CEOs’ regulatory foci and SME internationalization. Journal of International Management, 23(3), 268-278.

Aharoni, E., & Kiehl, K. A. (2013). Evading justice: Quantifying criminal success in incarcerated psychopathic offenders. Criminal Justice and Behavior, 40(6), 629-645.

Althaus, D., & Baumann, T. (2020). Reducing long-term risks from malevolent actors. In Effective Altruism Forum. https://forum.effectivealtruism.org/posts/LpkXtFXdsRd4rG8Kb/reducing-long-term-risks-from-malevolent-actors

Amy Carlson (religious leader). (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Amy_Carlson_(religious_leader)

Anthropic (2024). What should an AI’s personality be? In YouTube.

https://www.youtube.com/watch?v=iyJj9RxSsBY

Babiak, P., Neumann, C. S., & Hare, R. D. (2010). Corporate psychopathy: Talking the walk. Behavioral sciences & the law, 28(2), 174-193.

Back, M. D., Schmukle, S. C., & Egloff, B. (2010). Why are narcissists so charming at first sight? Decoding the narcissism–popularity link at zero acquaintance. Journal of personality and social psychology, 98(1), 132.

Bader, M., Hartung, J., Hilbig, B. E., Zettler, I., Moshagen, M., & Wilhelm, O. (2021). Themes of the dark core of personality. Psychological Assessment, 33(6), 511.

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., … & Kaplan, J. (2022). Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.

Bailey, E. R., & Iyengar, S. S. (2023). Positive—more than unbiased—self-perceptions increase subjective authenticity. Journal of Personality and Social Psychology.

Baumeister, R. F., & Campbell, W. K. (2014). The intrinsic appeal of evil: Sadism, sensational thrills, and threatened egotism. In Perspectives on Evil and Violence (pp. 210-221). Psychology Press.

Bhattacharya, A., Toro Díaz, V. C., Morran, L. T., & Bashey, F. (2019). Evolution of increased virulence is associated with decreased spite in the insect-pathogenic bacterium Xenorhabdus nematophila. Biology letters, 15(8), 20190432.

Benning, S. D., Patrick, C. J., Hicks, B. M., Blonigen, D. M., & Krueger, R. F. (2003). Factor structure of the psychopathic personality inventory: validity and implications for clinical assessment. Psychological assessment, 15(3), 340.

Blasco‐Belled, A., Tejada‐Gallardo, C., Alsinet, C., & Rogoza, R. (2024). The links of subjective and psychological well‐being with the Dark Triad traits: A meta‐analysis. Journal of Personality, 92(2), 584-600.

Blonigen, D. M., Carlson, S. R., Krueger, R. F., & Patrick, C. J. (2003). A twin study of self-reported psychopathic personality traits. Personality and Individual Differences, 35(1), 179-197.

Boddy, C. R. (2010). Corporate psychopaths and organizational type. Journal of Public Affairs, 10(4), 300-312.

Bostrom, N. (2019). The vulnerable world hypothesis. Global Policy, 10(4), 455-476.

Brogaard, B. (2020). Hatred: understanding our most dangerous emotion. Oxford University Press.

Burkle, F. M. (2016). Antisocial personality disorder and pathological narcissism in prolonged conflicts and wars of the 21st century. Disaster medicine and public health preparedness, 10(1), 118-128.

Cannon, M., Vedel, A., & Jonason, P. K. (2020). The dark and not so humble: School-type effects on the Dark Triad traits and intellectual humility. Personality and Individual Differences, 163, 110068.

Carlson, E. N., Vazire, S., & Oltmanns, T. F. (2011). You probably think this paper’s about you: narcissists’ perceptions of their personality and reputation. Journal of personality and social psychology, 101(1), 185.

Carter Allen, (2024). A Research Agenda for Psychology and AI. In Effective Altruism Forum.

https://forum.effectivealtruism.org/posts/hLdYZvQxJPSPF9hui/a-research-agenda-for-psychology-and-ai

Chester, D. S., DeWall, C. N., & Enjaian, B. (2019). Sadism and aggressive behavior: Inflicting pain to feel pleasure. Personality and social psychology bulletin, 45(8), 1252-1268.

Clark, C. J., Liu, B. S., Winegard, B. M., & Ditto, P. H. (2019). Tribalism is human nature. Current Directions in Psychological Science, 28(6), 587-592.

Coid, J., Yang, M., Ullrich, S., Roberts, A., & Hare, R. D. (2009). Prevalence and correlates of psychopathic traits in the household population of Great Britain. International journal of law and psychiatry, 32(2), 65-73.

Colgan, J. D. (2013). Domestic revolutionary leaders and international conflict. World Politics, 65(4), 656-690.

Collison, K. L., Vize, C. E., Miller, J. D., & Lynam, D. R. (2018). Development and preliminary validation of a five factor model measure of Machiavellianism. Psychological assessment, 30(10), 1401.

Common criticisms and (potentially) useful responses (n.d.) In EA Groups Resource Centre.

Coolidge, F. L., & Segal, D. L. (2007). Was Saddam Hussein like Adolf Hitler? A personality disorder investigation. Military Psychology, 19(4), 289-299.

Crowe, M. L., Lynam, D. R., Campbell, W. K., & Miller, J. D. (2019). Exploring the structure of narcissism: Toward an integrated solution. Journal of Personality, 87(6), 1151-1169.

da Silva, D., Rijo, D., Brazão, N., Paulo, M., Miguel, R., Castilho, P., … & Salekin, R. T. (2021). The efficacy of the PSYCHOPATHY. COMP program in reducing psychopathic traits: A controlled trial with male detained youth. Journal of Consulting and Clinical Psychology, 89(6), 499.

Demaine, E. D., & Demaine, M. L. (2023). Every Author as First Author. arXiv preprint arXiv:2304.01393.

Diao, K. (2024). Reimagining Malevolence: A Primer on Malevolence and Implications for EA. In Effective Altruism Forum.

Diller, S. J., Czibor, A., Szabó, Z. P., Restás, P., Jonas, E., & Frey, D. (2021). The positive connection between dark triad traits and leadership levels in self-and other-ratings. Leadership, Education, Personality: An Interdisciplinary Journal, 1-15.

Dufner, M., Gebauer, J. E., Sedikides, C., & Denissen, J. J. (2019). Self-enhancement and psychological adjustment: A meta-analytic review. Personality and Social Psychology Review, 23(1), 48-72.

Dullaghan, N. (2019). EA Survey 2019 Series: Community Demographics & Characteristics. In Effective Altruism Forum.

Durrant, R. (2011). Collective violence: An evolutionary perspective. Aggression and Violent Behavior, 16(5), 428-436.

Egosyntonic and egodystonic. (n.d.). In Wikipedia. https://en.wikipedia.org/wiki/Egosyntonic_and_egodystonic

Ellis, L., Farrington, D. P., & Hoskin, A. W. (2019). Handbook of crime correlates. Academic Press

False consensus effect (n.d.). In Wikipedia. https://en.wikipedia.org/wiki/False_consensus_effect

Flores-Camacho, A. L., Castillo-Verdejo, D. L., & Penagos-Corzo, J. C. (2022). Development and Validation of a Brief Scale of Vengeful Tendencies (BSVT-11) in a Mexican Sample. Behavioral Sciences, 12(7), 215.

Frazier, A., Ferreira, P. A., & Gonzales, J. E. (2019). Born this way? A review of neurobiological and environmental evidence for the etiology of psychopathy. Personality Neuroscience, 2, e8.

Fulker, Z., Forber, P., Smead, R., & Riedl, C. (2021). Spite is contagious in dynamic networks. Nature communications, 12(1), 260.

Gardner, A., Hardy, I. C., Taylor, P. D., & West, S. A. (2007). Spiteful soldiers and sex ratio conflict in polyembryonic parasitoid wasps. The American Naturalist, 169(4), 519-533.

Gebauer, J. E., Sedikides, C., Verplanken, B., & Maio, G. R. (2012). Communal narcissism. Journal of personality and social psychology, 103(5), 854.

George, F. R., & Short, D. (2018). The cognitive neuroscience of narcissism. Journal of Brain, Behavior and Cognitive Sciences, 1(1), 1-9.

Glad, B. (2002). Why tyrants go too far: Malignant narcissism and absolute power. Political Psychology, 23(1), 1-2.

Goldner-Vukov, M., & Moore, L. J. (2010). Malignant narcissism: from fairy tales to harsh reality. Psychiatria Danubina, 22(3), 392-405.

Green, A., & Charles, K. (2019). Voicing the victims of narcissistic partners: A qualitative analysis of responses to narcissistic injury and self-esteem regulation. Sage Open, 9(2), 2158244019846693.

Greenberg, S. (2023, November). Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? – three theories and a lot of evidence. In Spencer Greenberg.com—https://www.spencergreenberg.com/2023/11/who-is-sam-bankman-fried-sbf-really-and-how-could-he-have-done-what-he-did-three-theories-and-a-lot-of-evidence/

Grijalva, E., & Zhang, L. (2016). Narcissism and self-insight: A review and meta-analysis of narcissists’ self-enhancement tendencies. Personality and Social Psychology Bulletin, 42(1), 3-24.

Gubik, A. S., & Vörös, Z. (2023). Why narcissists may be successful entrepreneurs: The role of entrepreneurial social identity and overwork. Journal of Business Venturing Insights, 19, e00364.

Halstead, J. & Thomson, P. (2023). Violence before agriculture. Effective Altruism Forum.

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. (2015). A critique of the cross-lagged panel model. Psychological methods, 20(1), 102.

Hamiltonian spite (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Hamiltonian_spite

Harpur , R. A. (2022). Interview with a Narcissist with ‪@Thenamelessnarcissist‬. In YouTube. https://www.youtube.com/watch?v=yCtsQo39KUk&t=912s

Hervey M. Cleckley. (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Hervey_M._Cleckley

Hilbig, B. E., Thielmann, I., Klein, S. A., Moshagen, M., & Zettler, I. (2021). The dark core of personality and socially aversive psychopathology. Journal of Personality, 89(2), 216-227.

Hirschi, A., & Jaensch, V. K. (2015). Narcissism and career success: Occupational self-efficacy and career engagement as mediators. Personality and Individual Differences, 77, 205-208.

Houston, J. R. (2019). The Dark Tetrad Empowered: The Dark Tetrad and Power Motivations Within the Normal Personality Space. Western Carolina University. Available from: https://www.semanticscholar.org/paper/The-Dark-Tetrad-empowered%3A-the-Dark-Tetrad-and-the-Houston/c3e06c82f18183a333e3c8b1da958a05d7f8be3b

Immelman, A. (2018). The Personality Profile of North Korean Supreme Leader Kim Jong Un. Available from: https://digitalcommons.csbsju.edu/cgi/viewcontent.cgi?article=1120&context=psychology_pubs

Janus (n.d.) Simulators. In LessWrong. https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators

Jin, W., Zhan, T., Geng, Y., Shi, Y., Hu, W., & Ye, B. (2024). Social appearance anxiety among the dark tetrad and self-concealment. Scientific Reports, 14(1), 4667.

Joly, J., Soroka, S., & Loewen, P. (2019). Nice guys finish last: Personality and political success. Acta Politica, 54, 667-683.

Jonason, P. K., & Webster, G. D. (2010). The dirty dozen: a concise measure of the dark triad. Psychological assessment, 22(2), 420.

Jonason, P. K., & Zeigler-Hill, V. (2018). The fundamental social motives that characterize dark personality traits. Personality and Individual Differences, 132, 98-107.

Jones, D. N., & Paulhus, D. L. (2014). Introducing the short dark triad (SD3) a brief measure of dark personality traits. Assessment, 21(1), 28-41.

Jones, D. N., & Paulhus, D. L. (2017). Duplicity among the dark triad: Three faces of deceit. Journal of Personality and Social Psychology, 113(2), 329–342.

Joseph Stalin (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Joseph_Stalin

Kahane, G., Everett, J. A., Earp, B. D., Caviola, L., Faber, N. S., Crockett, M. J., & Savulescu, J. (2018). Beyond sacrificial harm: A two-dimensional model of utilitarian psychology. Psychological review, 125(2), 131.

Katsafanas, P. (2022). Group fanaticism and narratives of ressentiment. In The philosophy of fanaticism (pp. 157-183). Routledge.

Kenrick, D. T., & Sheets, V. (1993). Homicidal fantasies. Ethology and Sociobiology, 14(4), 231-246.

Klaas, B. (2021). Corruptible: Who gets power and how it changes us. Simon and Schuster.

Kovač, G., Sawayama, M., Portelas, R., Colas, C., Dominey, P. F., & Oudeyer, P. Y. (2023). Large language models as superpositions of cultural perspectives. arXiv preprint arXiv:2307.07870.

Krizan, Z., & Herlache, A. D. (2018). The narcissism spectrum model: A synthetic view of narcissistic personality. Personality and social psychology review, 22(1), 3-31.

Kückelhaus, B. P., Blickle, G., Kranefeld, I., Körnig, T., & Genau, H. A. (2021). Five factor Machiavellianism: Validation of a new measure. Journal of Personality Assessment, 103(4), 509-522.

Kurzban, R. (2011). Why everyone (else) is a hypocrite: Evolution and the modular mind. Princeton University Press.

Lahey, B. B., Tiemeier, H., & Krueger, R. F. (2022). Seven reasons why binary diagnostic categories should be replaced with empirically sounder and less stigmatizing dimensions. JCPP advances, 2(4), e12108.

Lang, L. (2022). Distribution Shifts and The Importance of AI Safety. In AI Alignment Forum.

https://www.alignmentforum.org/posts/TRKF9g65nhPBQoxJu/distribution-shifts-and-the-importance-of-ai-safety

Leckelt, M., Richter, D., Wetzel, E., & Back, M. D. (2019). Longitudinal associations of narcissism with interpersonal, intrapersonal, and institutional outcomes: An investigation using a representative sample of the German population. Collabra: Psychology, 5(1), 26.

Liang, T., Wang, X., Ng, S., Xu, X., & Ning, Z. (2024). The dark side of mental toughness: a meta-analysis of the relationship between the dark triad traits and mental toughness. Frontiers in Psychology, 15, 1403530.

Love Has Won: The Cult of Mother God. (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Love_Has_Won:_The_Cult_of_Mother_God

Maier, J., Dian, M., & Oschatz, C. (2022). Who Are the “Dark” Politicians? Insights From Self-Reports of German State Parliament Candidates. Politics and Governance, 10(4), 349-360.

Malesza, M., & Kaczmarek, M. C. (2020). The convergent validity between self-and peer-ratings of the Dark Triad personality. Current Psychology, 39, 2166-2173.

Malignant narcissism (n.d.) In Wikipedia. https://en.wikipedia.org/wiki/Malignant_narcissism

Maples, J. L., Lamkin, J., & Miller, J. D. (2014). A test of two brief measures of the dark triad: The dirty dozen and short dark triad. Psychological Assessment, 26(1), 326–331.

Maples-Keller, J. L., & Miller, J. D. (2018). Insight and the Dark Triad: Comparing self-and meta-perceptions in relation to psychopathy, narcissism, and Machiavellianism. Personality Disorders: Theory, Research, and Treatment, 9(1), 30.

Marcus, D. K., Zeigler-Hill, V., Mercer, S. H., & Norris, A. L. (2014). The psychology of spite and the measurement of spitefulness. Psychological assessment, 26(2), 563.

Marius Hobbhahn, Jérémy Scheurer, Mikita Balesni, rusheb, & AlexMeinke (2024). A starter guide for evals. In LessWrong, https://www.lesswrong.com/posts/2PiawPFJeyCQGcwXG/a-starter-guide-for-evals

Marshall, J., Watts, A. L., & Lilienfeld, S. O. (2018). Do psychopathic individuals possess a misaligned moral compass? A meta-analytic examination of psychopathy’s relations with moral judgment. Personality Disorders: Theory, Research, and Treatment, 9(1), 40.

Miller, J. D., Jones, S. E., & Lynam, D. R. (2011). Psychopathic traits from the perspective of self and informant reports: Is there evidence for a lack of insight?. Journal of Abnormal Psychology, 120(3), 758.

Miller, J. D., Lynam, D. R., McCain, J. L., Few, L. R., Crego, C., Widiger, T. A., & Campbell, W. K. (2016). Thinking structurally about narcissism: An examination of the Five-Factor Narcissism Inventory and its components. Journal of Personality Disorders, 30(1), 1-18.

Miller, J. D., Hyatt, C. S., Maples‐Keller, J. L., Carter, N. T., & Lynam, D. R. (2017). Psychopathy and Machiavellianism: A distinction without a difference?. Journal of personality, 85(4), 439-453.

Miller, J. D., Lynam, D. R., Siedor, L., Crowe, M., & Campbell, W. K. (2018). Consensual lay profiles of narcissism and their connection to the Five-Factor Narcissism Inventory. Psychological Assessment, 30(1), 10.

Minkel, J. R. (2010). Fear Review: Critique of Forensic Psychopathy Scale Delayed 3 Years by Threat of Lawsuit. In Scientific American.

Monaghan, C., Bizumic, B., Williams, T., & Sellbom, M. (2020). Two-dimensional Machiavellianism: Conceptualization, theory, and measurement of the views and tactics dimensions. Psychological Assessment, 32(3), 277–293.

Moral disengagement (n.d.). In Wikipedia. https://en.wikipedia.org/wiki/Moral_disengagement

Moshagen, M., Hilbig, B. E., & Zettler, I. (2024). How and why aversive personality is expressed in political preferences. Journal of Personality and Social Psychology.

Moshagen, M., Zettler, I., & Hilbig, B. E. (2020). Measuring the dark core of personality. Psychological Assessment, 32(2), 182.

Muris, P., Merckelbach, H., Otgaar, H., & Meijer, E. (2017). The malevolent side of human nature: A meta-analysis and critical review of the literature on the dark triad (narcissism, Machiavellianism, and psychopathy). Perspectives on psychological science, 12(2), 183-204.

Nai, A. (2019a). Disagreeable narcissists, extroverted psychopaths, and elections: A new dataset to measure the personality of candidates worldwide. European Political Science, 18(2), 309-334.

Nai, A. (2019b). The electoral success of angels and demons: Big five, dark triad, and performance at the ballot box. Journal of Social and Political Psychology, 7(2), 830-862.

Nai, A. (2022). Populist voters like dark politicians. Personality and Individual Differences, 187, 111412.

Nai, A., & Maier, J. (2023). Dark politics: The personality of politicians and the future of democracy. Oxford University Press.

Nedergaard, J. S., & Lupyan, G. (2024). Not Everybody Has an Inner Voice: Behavioral Consequences of Anendophasia. Psychological Science, 09567976241243004.

Oltmanns, J. R., & Widiger, T. A. (2018). Assessment of fluctuation between grandiose and vulnerable narcissism: Development and initial validation of the FLUX scales. Psychological Assessment, 30(12), 1612.

Nooshabadi, J. E., Mockaitis, A. I., & Chugh, R. (2024). Chief executive officer’s dark triad personality and firm’s degree of internationalization: The mediating role of ambidexterity. International Business Review, 33(4), 102296.

O’Meara, A., Davies, J., & Hammond, S. (2011). The psychometric properties and utility of the Short Sadistic Impulse Scale (SSIS). Psychological assessment, 23(2), 523.

O’Reilly, C. A., & Chatman, J. A. (2020). Transformational leader or narcissist? How grandiose narcissists can create and destroy organizations and institutions. California Management Review, 62(3), 5-27.

O’Reilly, C. A., & Pfeffer, J. (2021). Why are grandiose narcissists more effective at organizational politics? Means, motive, and opportunity. Personality and Individual Differences, 172, 110557.

Pan, A., Chan, J. S., Zou, A., Li, N., Basart, S., Woodside, T., Zhang, H., Emmons, S. & Hendrycks, D. (2023, July). Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark. In International Conference on Machine Learning (pp. 26837-26867). PMLR.

Patel, D. (2024, June). Leopold Aschenbrenner—China/US Super Intelligence Race, 2027 AGI, & The Return of History. In Dwarkesh Podcast.

Paulhus, D. L. (2012). Overclaiming on personality questionnaires. In this book: Ziegler, M., MacCann, C., & Roberts, R. (Eds.). (2012). New perspectives on faking in personality assessment. Oxford University Press.

Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., Amodei, D., Drain, D., Li, D., Tran-Johnson, E., Khundadze, G., Kernion, J., Landis, J., Kerr, J., Mueller, J., Hyun, J., Landau, J., Ndousse, K., Goldberg, L., Lovitt, L., Lucas, M., Sellitto, M., Zhang, M., Kingsland, N., Elhage, N., Joseph, N., Mercado, N., DasSarma, N., Rausch, O., Larson, R., McCandlish, S., Johnston, S., Kravec, S., El Showk, S., Lanham, T., Telleen-Lawton, T., Brown, T., Henighan, T., Hume, T., Bai, Y., Hatfield-Dodds, Z., Clark, J., Bowman, S.R., Askell, A., Grosse, R., Hernandez, D., Ganguli, D., Hubinger, E., Schiefer, N., Kaplan, J. (2022). Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251.

Pfattheicher, S., Lazarević, L. B., Westgate, E. C., & Schindler, S. (2021). On the relation of boredom and sadistic aggression. Journal of Personality and Social Psychology, 121(3), 573.

Piper, K. (2024, May 19). ChatGPT can talk, but OpenAI employees sure can’t. In Vox. https://www.vox.com/future-perfect/2024/5/17/24158478/openai-departures-sam-altman-employees-chatgpt-release

Piper, K. (2024, May 24). Leaked OpenAI documents reveal aggressive tactics toward former employees. In Vox. https://www.vox.com/future-perfect/351132/openai-vested-equity-nda-sam-altman-documents-employees

Psychopathy (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Psychopathy#Signs_and_symptoms

Rauthmann, J. F. (2012). The Dark Triad and interpersonal perception: Similarities and differences in the social consequences of narcissism, Machiavellianism, and psychopathy. Social Psychological and Personality Science, 3(4), 487-496.

Ressentiment (n.d.). In Wikipedia. https://en.wikipedia.org/wiki/Ressentiment

Richter, D., & Schupp, J. (2012). SOEP Innovation Sample (SOEP-IS)—Description, structure and documentation.

Rukundo-Zeller, A. C. (2022). Sequel of war: Investigating appetitive aggression, violence perpetration, PTSD symptoms and treatment effects using psychological and epigenetic variables of the doctoral thesis.

RyanCarey (2023, September). Thoughts on EA, post-FTX. In Effective Altruism Forum. https://forum.effectivealtruism.org/posts/i7DWM6zhhPr2ccq35/thoughts-on-ea-post-ftx#2__

Sam Bankman-Fried (n.d). In Wikipedia. https://en.wikipedia.org/wiki/Sam_Bankman-Fried

Sanz-García, A., Gesteira, C., Sanz, J., & García-Vera, M. P. (2021). Prevalence of psychopathy in the general adult population: A systematic review and meta-analysis. Frontiers in Psychology, 12, 661044.

Satow, R. (2017). Idealization and contempt. In Psychology Today. https://www.psychologytoday.com/au/blog/life-after-50/201702/idealization-and-contempt

Schermer, J. A., & Jones, D. N. (2020). The behavioral genetics of the dark triad core versus unique trait components: A pilot study. Personality and Individual Differences, 154, 109701.

Scott Alexander (2009). Generalizing From One Example. https://www.lesswrong.com/posts/baTWMegR42PAsH9qJ/generalizing-from-one-example

Self-deception, (n.d.). In Wikipedia https://en.wikipedia.org/wiki/Self-deception

Self-selection bias, (n.d.). In Wikipedia. https://en.wikipedia.org/wiki/Self-selection_bias

Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role play with large language models. Nature, 623(7987), 493-498.

Sharpe, B. M., Sleep, C. E., Carter, N. T., Lynam, D. R., & Miller, J. D. (2023). Is Personality Pathology Ego-Syntonic? Self-and Meta-Perception of Maladaptive Personality Traits. Journal of Personality Disorders, 37(4), 383-405.

Simler, K., & Hanson, R. (2017). The elephant in the brain: Hidden motives in everyday life. Oxford University Press.

Sleep, C. E., Lamkin, J., Lynam, D. R., Campbell, W. K., & Miller, J. D. (2019). Personality disorder traits: Testing insight regarding presence of traits, impairment, and desire for change. Personality disorders: Theory, research, and treatment, 10(2), 123.

Sotala, K. (2019). Multiagent models of mind (sequence). In LessWrong. https://www.lesswrong.com/s/ZbmRyDN8TCpBTZSip

Southard, A. C., Noser, A. E., Pollock, N. C., Mercer, S. H., & Zeigler-Hill, V. (2015). The interpersonal nature of dark personality features. Journal of Social and Clinical Psychology, 34(7), 555-586.

Stuckless, N., & Goranson, R. (1992). The vengeance scale: Development of a measure of attitudes toward revenge. Journal of social behavior and personality, 7(1), 25.

Szabó, Z. P., Diller, S. J., Czibor, A., Restás, P., Jonas, E., & Frey, D. (2023). “One of these things is not like the others”: The associations between dark triad personality traits, work attitudes, and work-related motivation. Personality and Individual Differences, 205, 112098.

The Dark Factor of Personality. https://www.darkfactor.org/

The Nameless Narcissist (2023). The family of the narcissist (Inter-generational trauma). In YouTube https://www.youtube.com/watch?v=4QiULsINoLU

Themelidis, L., & Davies, J. (2021). Creating evil: Can sadism be induced?. Personality and Individual Differences, 168, 110358.

Tomasik, B. (2018). Astronomical suffering from slightly misaligned artificial intelligence. In Reducing Suffering. https://reducing-suffering.org/near-miss

Trivers, R. (2002). Natural selection and social theory: Selected papers of Robert Trivers. Oxford University Press.

TurnTrout, Monte M, David Udell, lisathiergart, Ulisse Mini (2023). Steering GPT-2-XL by adding an activation vector. In LessWrong. https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

Trull, T. J., Jahng, S., Tomko, R. L., Wood, P. K., & Sher, K. J. (2010). Revised NESARC personality disorder diagnoses: gender, prevalence, and comorbidity with substance dependence disorders. Journal of personality disorders, 24(4), 412-426.

Ullrich, S., Farrington, D. P., & Coid, J. W. (2007). Dimensions of DSM-IV personality disorders and life-success. Journal of personality disorders, 21(6), 657-663.

Urbonaviciute, G., & Hepper, E. G. (2020). When is narcissism associated with low empathy? A meta-analytic review. Journal of Research in Personality, 89, 104036.

Uzieblo, K., Decuyper, M., Bijttebier, P., & Verhofstadt, L. (2022). When the partner’s reality bites: Associations between self-and partner ratings of psychopathic traits, relationship quality and conflict tactics. International journal of offender therapy and comparative criminology, 66(15), 1659-1681.

Vanman, E. J., Mejia, V. Y., Dawson, M. E., Schell, A. M., & Raine, A. (2003). Modification of the startle reflex in a community sample: Do one or two dimensions of psychopathy underlie emotional processing?. Personality and Individual Differences, 35(8), 2007-2021.

Vazire, S., & Funder, D. C. (2006). Impulsivity and the self-defeating behavior of narcissists. Personality and social psychology review, 10(2), 154-165.

Vize, C. E., Lynam, D. R., Lamkin, J., Miller, J. D., & Pardini, D. (2016). Identifying essential features of juvenile psychopathy in the prediction of later antisocial behavior: Is there an additive, synergistic, or curvilinear role for fearless dominance?. Clinical Psychological Science, 4(3), 572-590.

Walen, A. (2023). Retributive Justice. In The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/win2023/entries/justice-retributive/.

Watson, D., Ellickson-Larew, S., Stanton, K., Levin-Aspenson, H. F., Khoo, S., Stasik-O’Brien, S. M., & Clark, L. A. (2019). Aspects of extraversion and their associations with psychopathology. Journal of Abnormal Psychology, 128(8), 777.

Watts, A. L., Lilienfeld, S. O., Smith, S. F., Miller, J. D., Campbell, W. K., Waldman, I. D., … & Faschingbauer, T. J. (2013). The double-edged sword of grandiose narcissism: Implications for successful and unsuccessful leadership among US presidents. Psychological science, 24(12), 2379-2389.

Whitehead, M. (2024). A dark force for good as well as bad in the organisation? Investigating the relationship between Dark Triad personality traits, self-control, and workplace outcomes (Doctoral dissertation, London School of Economics and Political Science).

Winsper, C., Bilgin, A., Thompson, A., Marwaha, S., Chanen, A. M., Singh, S. P., … & Furtado, V. (2020). The prevalence of personality disorders in the community: a global systematic review and meta-analysis. The British Journal of Psychiatry, 216(2), 69-78.

Work, R. (2002). Overview of decentralization worldwide: a stepping stone to improved governance and human development.

Wright, C. (2021). Lizardman’s Constant. Brain Lenses. https://brainlenses.substack.com/p/lizardmans-constant

Zeigler-Hill, V., Martinez, J. L., Vrabel, J. K., Ezenwa, M. O., Oraetue, H., Nweze, T., Andrews, D. & Kenny, B. (2020). The darker angels of our nature: Do social worldviews mediate the associations that dark personality features have with ideological attitudes?. Personality and Individual Differences, 160, 109920.

Zettler, I., Moshagen, M., & Hilbig, B. E. (2021). Stability and change: The dark factor of personality shapes dark traits. Social Psychological and Personality Science, 12(6), 974-983.

Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., … & Irving, G. (2019). Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.

^
This might change—relevant studies might be published in the near future. Please see the appendices for examples of potentially relevant ongoing work.
^
We discuss some of the most concerning types from a longtermist perspective in the appendices.
^
For an overview of why we believe this to be the case, please see here (Althaus & Baumann, 2020).
^
For example, in this study (Malesza & Kaczmarek, 2020), 266 participants from two universities in Germany were each asked to collect three peer informant ratings on dark triad traits. Although there were medium correlations between self- and peer-report ratings for some traits of interest, such as a self-peer correlation of 0.46 for callous affect, there were only low correlations for other relevant traits, such as 0.16 for the exploitative/entitlement facet of narcissism. In another study (Szabó et al., 2023), a total of 91 organizational leaders (46 Germans, 45 Hungarians; 49 men, 42 women; 35 with low-level leadership positions, 33 with high-level leadership positions, and 23 who were founders of their companies) collected ratings from a total of 264 subordinates; the correlation between self- and other-ratings for psychopathy was only 0.26.
It’s plausible that people with high levels of malevolent traits would be recognized as having those traits by those close to them or by those working with them, especially over a period of years, but this is not sufficiently reassuring because (a) during times of rapid change, such as before or following TAI development, taking years to develop an impression of someone is not going to be useful, and (b) seeking and obtaining information from all the people who have known someone for a long enough period of time is not currently standard practice for most hiring or promotion decisions (e.g., when references are sought, typically a very small number of people are contacted compared to the number who’d need to be interviewed to get a more comprehensive picture of someone’s personality).
Furthermore, even having close knowledge of someone may not be sufficient to detect when someone has a high level of interpersonal manipulativeness. For example, in a study (Uzieblo et al., 2022) of 259 heterosexual couples (with a mean relationship duration at the time of the study of 72.7 months [SD=121.0 months]), although there was some agreement between female partners’ ratings of their partner’s psychopathy scores according to the Self-Report Psychopathy-Short Form (SRP-SF; Paulhus et al., 2016), female partners tended to underrate the levels of psychopathy in their partners, and the subscale on which they were least accurate was the Interpersonal Manipulation subscale (plausibly the most concerning subscale from a longtermist perspective), where there was a correlation of only 0.32 between the female partners’ ratings and the male partners’ self-reported ratings.
^
For example, see (Back, Schmukle & Egloff, 2010).
^
Potentially related to this, it seems that people with relatively more positive views of themselves might view themselves as more authentic (Bailey & Iyengar, 2023), even if those positive views are actually distorted. In addition, those with higher levels of dark traits may have lower levels of intellectual humility (Cannon, Vedel, & Jonason, 2020), and it seems plausible that this could limit the probability that they develop insight into the extent of their dark traits.
^
As another example of the range of attitudes to dark traits, this r/aspd thread contains multiple examples from both ends of the spectrum of attitudes—including both those who claim to be proud of their diagnosis of antisocial personality disorder and those who claim not to be.
^
This is explained later. Malignant narcissism was described by Kernberg (1984, cited in Goldner-Vukov & Moore, 2010) as “1) a typical core narcissistic personality disorder (NPD), 2) antisocial [behavior] (ASB), 3) ego-syntonic sadism and 4) a deeply paranoid orientation toward life.” It has been proposed that numerous dictators and tyrants had/have malignant narcissism, though the concept has mainly been studied in political science while being relatively ignored within the psychology literature.
^
Note that not everyone has an internal monologue (Nedergaard & Lupyan, 2024).
^
This kind of self-insight and ego-syntonicity of dark traits is commonly depicted in fiction. Having said this, such stereotypes are not without their real-life counterparts: it seems plausible there are some people who really might have an internal monologue that goes something like, “my god, I love hurting my enemies.” For example, it seems plausible that someone like Genghis Khan could have thought like this.
^
Though the evidence cited in this footnote does not relate to indivdiual’s insight into their levels of malevolence per se, it nevertheless seems relevant to their self-insight in general. Self-deception seems associated with dark traits, especially narcissism. A meta-analysis by Grijalva et al., 2015 found that narcissism is associated with self-enhancement—they appear to perceive themselves more positively than they are perceived by others. And it seems that at least part of this is due to self-deception rather than due to conscious impression management. For instance, Jones and Paulhus (2017, study 5) found that narcissism (and psychopathy to a lower extent) was associated with self-deception, as measured by overclaiming bias in a private context, which (unless the respondents thought that their answers might be read and linked back to them) is unlikely to be attributable to impression management.
Gebauer et al. (2012) and Paulhus (2012) also found that narcissism is linked to overclaiming in private contexts. In fact, it seems like those with highly elevated levels of (communal) narcissism would have to self-deceive in order to maintain their positive self-image, and the less socially desirable their actual traits, the more “skilled” they’d have to be at self-deception.

In general, it seems plausible that many malevolent actors are not fully aware of their malevolent motivations. They may delude themselves and claim (and believe) to serve prosocial goals while in reality being driven by more selfish motivations like gaining resources, status, or power. Of course, this is plausibly true for most people—though the mismatch between their stated and actual motivations is probably (much) lower (Kurzban, 2011; Simler & Hanson, 2017).
^
Some of the text here is self-plagiarized adapted from a previous comment by David.
^
For example, this r/sociopath thread contains multiple examples of people claiming not to have any altruistic preferences. Later in this piece, we also briefly cover the potential differences in moral reasoning exhibited by people high in malevolent traits.
^
In fact, Stalin was imprisoned multiple times and sentenced to exile in Siberia. You could argue that he was just opportunistic and tried whatever he could to gain power, but during the early 1900s, it was far from clear that a communist revolution would ever succeed. Plus, Stalin sided with the Bolsheviks, and not the Mensheviks, even though the latter had more power at the beginning. In general, it seems that the more parsimonious explanation is that genuinely-held ideological/moral beliefs were at least partly guiding his actions. Of course, Stalin’s personality traits probably influenced how attractive he found various ideologies: more specifically, he may have been attracted to communism because of its emphasis on (potentially) violent tactics such as revolutions and robberies.
^
For pragmatic reasons, we are not mentioning specific examples here, except for SBF (as he is already in prison and there is now plenty of public information about him). He has been referred to as narcissistic (at ~02:27:10 here) by someone who worked with him, and several people who have known him well say that they find it plausible that he is affected by deficient affective experience. Despite this, he was particularly influential in the EA community, and concerns about his dark traits (e.g., see also here) only came to light after he had done a lot of damage.
^
An example of an item on a communal narcissism scale is “I’m the most altruistic person I know.”
^
For example, the Nameless Narcissist reported that he successfully concealed his narcissistic traits from his therapist for the first two years of therapy sessions with her. He said that he “lied to her face for like two years because I didn’t deal with the shame of a lot of parts of me, and she had no idea that I was a narcissist. Two years she had no f***ing clue.” He also reports that he saw his therapist as “this person that I’m using as a tool to help myself grow.”
^
Many thanks to Winston Oswald-Drummond and Ewelina Tur for extensive discussions on this topic.
^
The following papers provide just some examples of evidence supporting this statement. See Frazier, Ferreira & Gonzales, 2019 for a review of the etiology underlying psychopathy, Schermer & Jones, 2020 for a study on the behavioral genetics of the dark triad and its component traits, and Rukundo-Zeller, 2022 for hypotheses regarding the etiology of appetitive aggression and violence.
^
An example of someone with high levels of malevolent traits but who also reports having other (in some cases quite prosocial) motivations such as these, and who attributes his malevolent traits to his upbringing, is “The Nameless Narcissist.”
^
For explanations of terms and other background/context, please see the appendices.
^
Trait sadism should not be confused with the separate concept of Sexual Sadism Disorder. Trait sadism is not a recognized clinical disorder and is not about sexual preferences.
^
For details on this point, please see the appendices.
^
Psychopathy can be factorized in a number of ways, but it is usually into either two or three factors.
^
Note that affective empathy is distinct from cognitive empathy. Affective empathy is the (involuntary) tendency to vicariously experience others’ emotions. Cognitive empathy is the ability to understand other people’s intentions, thoughts, and beliefs.
^
We didn’t examine Machiavellianism in significant detail because there appears to be substantial overlap between it and psychopathy. For example, planfulness has been proposed as an important distinguishing factor of Machiavellianism, which helps to separate it from psychopathy, yet the most established measure of Machiavellianism (MACH-IV) fails to measure planfulness (Miller et al., 2016). See Kückelhaus et al. (2020) for a five-factor measure of Machiavellianism and Monaghan et al. (2020) for a two-dimensional measure.
^
Grandiose narcissism appears more dangerous from a longtermist perspective. However, it’s important to note that these two types are not mutually exclusive, and that individuals can transition between grandiose and vulnerable states (e.g., Oltmanns & Widiger, 2018). In grandiose narcissism, the antagonistic self-centredness is accompanied by boldness, exhibitionism, and high self-esteem; whereas in vulnerable narcissism, the antagonistic self-centredness is accompanied by reactivity, low self-esteem, and susceptibility to envy (Krizan & Herlache, 2018).
^
The construct has been mentioned by multiple historians, but is not frequently studied in the general population or in psychiatric research. Indeed, Goldner-Vukov and Moore (2010) note that malignant narcissism has been “largely ignored in psychiatric literature and research.”
^
This has been measured through various different scales, including the Vengeance Scale (Stuckless & Goranson, 1992) and the Brief Scale of Vengeful Tendencies (Flores-Camacho, Castillo-Verdejo, & Penagos-Corzo, 2022), among others.
^
Please see the appendices for more details about this concept.
^
This can contribute to a phenomenon called ressentiment (a philosophical term for a form of resentment directed at particular groups in society).
^
To be more specific, D does not capture all of the variation in specific dark traits, and (at least in theory) someone might be much more of an outlier on a specific individual dark trait than they are on D, for example.
^
For example, in terms of clinically concerning traits, people are usually only classified as having Narcissistic Personality Disorder if they are distressed by their symptoms. However, people who exhibit symptoms without being distressed by them are plausibly even more concerning from a longtermist perspective.
In terms of forensically-relevant traits, the traits that correlate with someone being convicted of a crime are unlikely to be the same traits that would pose the highest risks to the long-term future. For example, in the Psychopathy Checklist Revised (PCL-R), often referred to as the “gold standard” for diagnosing psychopathy, one could, in theory, fulfill all the criteria of the callous-interpersonal factor but none of the antisocial-impulsive factor (or vice versa) and fail to be recognized as a psychopath. Hence, the data on identified psychopaths are necessarily about individuals who have some of both factors. Unfortunately, someone who scores highly only on the callous-interpersonal factor but who is not impulsive is likely more dangerous from a longtermist perspective.
^
For a discussion of what we mean by “concerning,” please see the appendices.
^
This approach becomes even more compelling when considering other current world leaders like Vladimir Putin or Kim Jong Un, whose actions strongly suggest high levels of malevolent traits
^
Having said this, the relationship between the prevalence of malevolent individuals and the aggregate harm they cause is not straightforward. Please see the appendices for further discussion.
^
Approach (A) would be particularly costly if one were to publicly talk about the levels of malevolence of individuals about whom one had inconclusive information. On this note, tips for cautiously gathering further information are given in the appendices. (Approach (B) might also be somewhat useful in that it could give you an idea of how commonly you might expect to run into people with high levels of malevolent traits in your life, but only if you expect your social circles to be similar to the participants in the studies cited below, or if you can make reasonable guesses as to how your social circles might differ from such participants.)
^
The samples were from North America (USA and Canada: 43.75%); the United Kingdom and Australia (31.25%); and mainland Europe (Sweden, Belgium, and Portugal: 25%).
^
In addition to managers and executives, these groups also included procurement and supply professionals, and advertising workers.
^
NPD mostly captures clinical levels of grandiose narcissism; it does not represent vulnerable narcissism as well.
^
Please also see the appendices.
^
Winsper et al. (2020) conducted a meta-analysis assessing the prevalence of all DSM-IV personality disorders worldwide. There were a total of 19 studies (n = 11 from higher-income countries, and n = 8 from lower-and middle-income countries) included in the meta-analytic estimate of the prevalence of Cluster B personality disorders (into which NPD is classified).
^
E.g., O’Reilly & Chatman (2020).
^
The research group which has collected the most data on dark personality traits (to the best of our knowledge) maintains a website called DarkFactor.org. Per personal communication with one of their team (Prof. Ben Hilbig), the largest single dataset they’ve made publicly available on the Open Science Framework website is here. The dataset was used for a recent paper (Moshagen et al., 2024).
^
Interestingly, the item “I would like to make some people suffer, even if it meant that I would go to hell with them.” is not labeled as being about sadism—that one is instead labeled as being about “Frustralia,” which has been described as a facet of “amoralism” consisting of rationalization “and projection of amoral impulses, Machiavellianism, and resentment.”
^
Firstly, people aren’t necessarily taking the items literally. (See also this discussion.) Secondly, some may (also) not be responding sincerely even if they do take the items literally. Some may be failing to report their level of agreement with some statements for social desirability reasons. Due to the undesirability of agreeing with sadistic statements, the survey results reported here may represent underestimates.
^
At the end of the survey, you are asked if you would like your responses deleted and not used for research (so you can test out the survey without negatively affecting their research). A screenshot of that question is included in the appendices.
^
For example, Bader et al. (2021) used one sample from this website which was 48% female, 50% male, 2% other; mean age = 30.47, SD = 11.37. The sample was very diverse, with participants from over 100 countries. The sample is thus arguably more representative of the general population than most study samples which primarily consist of undergraduates or MTurk/Prolific participants.
^
To see the results someone would get if they answered in an equivalent fashion for everything in the D70, you can click here. The overall percentile for the Dark Factor would be the same (97th percentile), but if they’d answered the D70, they’d also receive a breakdown of their percentiles for individual traits. We disagree on a conceptual level with the choice of subtraits that some of the individual items were grouped under (for example, see the discussion in the earlier discussions about the items that reflect sadistic preferences and psychopathy), so rather than placing weight on the subtrait-level percentiles, in the earlier sections of this piece, we reported on the distributions of responses for a smaller subset of individual items that we selected as being the most relevant ones to report on.
^
Younger, more right‐leaning, and more ideologically extreme politicians also had higher D-scores.
^
In their first study on 397 US college-educated adults in full-time employment, they found a small but significant positive relationship between higher narcissism scores on the NPI-16 and higher levels of reported general political behavior (e.g., “People in this organization attempt to build themselves up by tearing others down”) (β = 0.05; p < .01).
In their second study on 151 US adults (none of whom participated in the prior study) they found that NPI-13 score significantly predicted engagement in self-interested political actions (β = 0.17; p < .01). Although these are small studies, they are suggestive of a general tendency for individuals higher in narcissism to demonstrate higher interest in politics and political behavior within their organizations.
In their third study, they found that NPI-13 scores significantly predicted political behavior (NPI-13, β = 0.19; p < .01) and the overall measure of political skill (NPI-13, β = 0.09, p < .05). However, these findings need to be interpreted with caution, because the accuracy of these self-report measures may have been affected by the respondents’ levels of narcissism.
^
Some of the new measures (such as the Five Factor Machiavellianism Inventory, also known as the FFMI) also show a positive association with ‘intrinsic aspirations’ (fi.e., meaningful relationships, personal growth and contributing to their community; Collison et al., 2018).
^
Many authors use the term “malignant narcissism” to describe the personality structure of tyrants and dictators (e.g., Burkle, 2015; Glad, 2002; George & Short, 2018; Goldner-Vukov & Moore, 2010). Prominent historical examples of leaders that have been described as sadistic include Hitler, Mao, Stalin, Kim Jong-il and Saddam Hussein (Coolidge & Segal, 2007) and Kim Jong Un (Immelman, 2018). Several graphic examples are also provided in Glad’s (2002) review.
^
Meta-analytic reviews and papers have confirmed that narcissism, but not psychopathy, is positively associated with extraversion (e.g., Muris et al., 2017; Vize et al., 2016; Watson et al., 2019).
^
For example, this has been found in a large round-robin design investigating first impressions in 2,628 dyads (Back, Schmukle & Egloff, 2010).
^
Leckelt et al. (2019) administered the NARQ-S to a large representative sample of German adults (n = 1,920) via the Innovation Sample of the Socioeconomic Panel (SOEP-IS; Richter & Schupp, 2012) in both 2013 and 2015, with a total of 1,526 adults completing the NARQ-S at both timepoints. Among other things, participants were also asked if they had a leadership role, via the following question: “In your position at work, do you supervise others? In other words, do people work under your direction?” (Participants were deemed to have a leadership position if they answered in the affirmative.) The authors then used cross-lagged panel models (CLPM) to examine the extent to which prior levels of narcissistic admiration and rivalry predict later outcomes, controlling for prior levels of the variables, the stability of narcissism and the outcomes, and the shared variance between admiration and rivalry. (The assumptions for using CLPM were checked and met; however, also see Hamaker et al., 2015.) They found that higher levels of narcissistic admiration (but not rivalry) predicted a higher likelihood of being in a leadership position two years later (residualised regression coefficient β = 0.13; p = 0.024).
^
For example, narcissists may be more likely to say that they’re in a higher position in their organization or that they have a higher number of people reporting directly to them than they really do (which might inflate the apparent positive association between narcissism and being promoted). On the other hand, they might also be less likely to admit to some of their narcissistic traits (which might reduce the positive association—H/T to Vanessa Sarre for noting this).
^
Internationalization refers to extending a firm’s activities to an international (as opposed to domestic) market (Adomako, Opoku, & Frimpong, 2017). In a study of 405 “top managers” (most of whom were the only person or one of only two people in the “top management team” of their company) of small and medium-sized enterprises (SMEs), there was a small correlation between self-reported degree of firm internationalization and self-reported dark triad scores (correlation between the CEO’s DT personality and the degree of internationalization (r = 0.22, p < 0.01)) (Nooshabadi, Mockaitis, & Chugh, 2024).
^
Along a similar theme, in a small study of 304 British men (all aged 48), all of whom had had social interviews and a psychiatric interview, narcissistic personality disorder was associated with a regression coefficient of 0.19 (p = 0.003) in a multivariate linear regression analysis predicting “status and wealth” (which was one of two life-success factors, with variables loading on it including social class (0.77), income (0.75), number of rooms in home (0.65), supervision of others at work (0.63), and home ownership (0.58))(Ullrich, Farrington, & Coid, 2007).
^
For more ideas regarding political interventions, please see here.
^
Other examples of people who have raised concerns about OpenAI are Leopold Aschenbrenner (who was eventually fired), Geoffrey Hinton, Geoffrey Irving, and the previous board.
^
For examples of some of the most important updates from that piece, see the section entitled, “How I reported this story.”
^
In addition to providing better support for whistleblowers (mentioned earlier), this could include providing histories, case studies, and written guidelines for individuals and organizations who suspect that they may be dealing/working with someone with elevated malevolent traits/behaviors, as well as (where possible) mentorship or similar forms of support provided by people who’ve successfully reduced the influence of or ousted a malevolent person from their organization. If you are interested in putting this idea into practice, please get in touch with us.
^
For example, is moral enhancement worth investigating further? (H/T to Oscar Delaney for raising this point here.)
^
Also see the following conversation with Amanda Askell about an LLM’s personality or character: “What should an AI’s personality be?” (published by Anthropic).
^
For example, a study with limited sample size by da Silva et al. (2021) found that a program based on compassion-focused therapy reduced psychopathic traits among male detained youth more than standard therapy. As an example of an intervention that appeared to drive dark traits in the opposite direction, Themelidis & Davies, 2021 found that watching images depicting accidental or self-inflicted injuries from everyday objects was associated with increases in self-reported sadism. If this effect replicates, what is the mechanism underlying this finding? And might there be ways to reduce sadism by doing the opposite (e.g., watching images depicting caring or helping behavior)?
^
Some possibly-contrived examples of this: if an actor had switched from loving to hating a particular group, or if their apparent promotion of prosocial values had successfully got them into power but had always been partially just to boost their ego or to satisfy their hunger for power (such that their real values were slightly misaligned with their reported values), these could be examples of “near misses.” In other words, in these hypothetical cases, someone is almost aligned with (an ~idealized set of) human values, but not quite—and such a (subtle) misalignment could potentially be more damaging than complete misalignment (partly because complete misalignment with human values is probably more conducive to getting a criminal record than to getting into positions of power in society).

What links here?

David Althaus, Chi Nguyen and Clare

23 Oct 2024 8:41 UTC

76 points

15 comments1 min readLW link

World Optimization Ethics & Morality Futurism World Modeling

Crossposted from EA Forum (103 points, 5 comments)

cousin_it 23 Oct 2024 15:47 UTC
5 points
1
I’m not sure focusing on individual evil is the right approach. It seems to me that most people become much more evil when they aren’t punished for it. A lot of evil is done by organizations, which are composed of normal people but can “normalize” the evil and protect the participants. (Insert usual examples such as factory farming, colonialism and so on.) So if we teach AIs to be as “aligned” as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.
- David Althaus 24 Oct 2024 9:18 UTC
  4 points
  0
  Parent
  I agree that the problem of “evil” is multifactorial with individual personality traits being only one of several relevant factors, with others like “evil/fanatical ideologies” or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.
  It seems to me that most people become much more evil when they aren’t punished for it. [...] So if we teach AIs to be as “aligned” as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.
  Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section).
  - cousin_it 24 Oct 2024 14:28 UTC
    3 points
    0
    Parent
    I’m afraid in a situation of power imbalance these interpersonal differences won’t matter much. I’m thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn’t stop it.
    
    To be clear, I’m not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they’re very similar to humans) will just systemically treat us badly.
    - David Althaus 27 Oct 2024 8:42 UTC
      4 points
      0
      Parent
      Thanks, I mostly agree.
      
      But even in colonialism, individual traits played a role. For example, compare King Leopold II’s rule over the Congo Free State vs. other colonial regimes.
      
      While all colonialism was exploitative, under Leopold’s personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.
      
      Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms.
      
      One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners).
      
      In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.^[1]
      ^
      Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism.
      - cousin_it 27 Oct 2024 9:42 UTC
        2 points
        0
        Parent
        The British weren’t much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.
        
        Colonialism didn’t end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.
        
        So I guess the difference between your view and mine is that I think colonialism kept going basically as long as it benefited the dominant group. Benevolence or malevolence didn’t come into it much. And if we get back to the AI conversation, my view is that when AIs become more powerful than people and can use resources more efficiently, the systemic gradient in favor of taking everything away from people will be just way too strong. It’s a force acting above the level of individuals (hmm, individual AIs) - it will affect which AIs get created and which ones succeed.
    - Viliam 25 Oct 2024 14:42 UTC
      2 points
      0
      Parent
      I don’t have enough data about it, but I think it is possible that these horrible mass behaviors start by some dark individuals doing it first… and others gradually joining them after observing that the behavior wasn’t punished, and maybe that they kinda need to do the same thing in order to remain competitive.
      In other words, the average person is quite happy to join some evil behavior that is socially approved, but there are individuals who are quite happy to initiate it. Removing those individuals from the positions of power could stop many such avalanches.
      (In my model, the average person is kinda amoral—happy to copy most behaviors of their neighbors, good and bad alike—and then we have small fractions of genuinely good and genuinely bad people, who act outside the Overton window; plus we can make the society better or worse by incentives and propaganda. For example, punishing bad behavior will deter most people, and stories about heroes will inspire some.)
      EDIT:
      For example, you mention colonialism. Maybe most people approved of it, but only some of them made the decisions and organized it. Remove the organizers, and there is no colonialism. More importantly, I think that most people approved of having the colonies simply because it was the status quo. The average person’s moral compass could probably be best described as “don’t do weird things”.
      - cousin_it 25 Oct 2024 21:07 UTC
        4 points
        0
        Parent
        I think a big part of the problem is that in a situation of power imbalance, there’s a large reward lying around for someone to do bad things—plunder colonies for gold, slaves, and territory; raise and slaughter animals in factory farms—as long as the rest can enjoy the fruits of it without feeling personally responsible. There’s no comparable gradient in favor of good things (“good” is often unselfish, uncompetitive, unprofitable).
        Viliam 26 Oct 2024 13:24 UTC
        2 points
        0
        Parent
        In theory, the reward for doing good should be prestige. (Which in turn may translate to more tangible rewards.) But that mostly works in small groups and doesn’t scale well.
        Some aspect of this seems like a coordination problem. Whatever is your personal definition of “good”, you would probably approve of a system that gives good people some kind of prestige, at least among other good people.
        For example, people may disagree about whether veganism is good or bad, but from a perspective of a vegan, it would be nice if vegans could have some magical “vegan mark” that would be unfalsifiable and immediately visible to other vegans. That way, you could promote your values not just by practicing and preaching your values, but also by rewarding other people who practice the same values. (For example, if you sell some products, you could give discounts to vegans. If many people start doing that, veganism may become more popular. Perhaps some people would criticize that as doing things for the wrong reasons, but the animals probably wouldn’t mind.) Similarly, effective altruists would approve of rewarding effective altruists, open source developers would approve of rewarding open source developers, etc.
        These things exist to some degree (e.g. the open source developers can put a link to their projects in a profile), but often the existing solutions don’t scale well. If you only have dozen effective altruists, they know each other by name, but if you get thousands, this stops working.
        One problem here is the association of “good” with “unselfish” and “non-judgmental”, which suggests that good people rewarding other good people is somehow… bad? In my opinion, we need to rethink that, because from the perspective of incentives and reinforcement, that is utterly stupid. The reason for these memes is that the past attempts to reward good often led to… people optimizing to be seen as good, rather than actually being good. That is a serious problem that I don’t know how to solve; I just have a strong feeling that going to the opposite extreme is not the right answer.
- Mo Putera 24 Oct 2024 6:15 UTC
  3 points
  0
  Parent
  I agree; Eichmann in Jerusalem and immoral mazes come to mind.
romeostevensit 24 Oct 2024 18:16 UTC
4 points
0
I think this is an important topic and am glad to see substantial scholarship efforts on it.

Wrt AI relevance: I think the meme that it matters a lot who builds the self activating doomsday device has done potentially quite a bit of harm and may be a main contributor to what kills us.

Wrt people detecting these traits: I personally feel that the self domestication of humans has made us easier targets for such people, and undermined our ability to even think of doing anything about them. I don’t think this is entirely random.
David Gross 23 Oct 2024 16:07 UTC
4 points
1
shame—no need to exacerbate such feelings if it can be avoided
Shame may be an important tool that people with dark traits can leverage to overcome those traits. Exacerbating it may in some cases be salutary.
- David Althaus 24 Oct 2024 9:20 UTC
  2 points
  0
  Parent
  Thanks, good point! I suppose it’s a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame (“carrot and stick”) may be best.
ZY 23 Oct 2024 21:45 UTC
2 points
0
Amazingly detailed article covering malevolence, interaction with power, and the other nuances! Have been thinking of exploring similar topics, and found this very helpful. Besides the identified research questions, some of which I highly agree with, one additional question I was wondering is: do self-awareness of one’s own malevolence factors help one to limit the malevolence factors? if so how effective would that be? how would this change when they have power?
- Viliam 11 Nov 2024 15:48 UTC
  3 points
  0
  Parent
  do self-awareness of one’s own malevolence factors help one to limit the malevolence factors?
  Probably the effect would be nonlinear, like the evil people would just laugh, the average might get depressed and give up, and the mostly-good would strive to achieve perfection (or conclude that they are already good enough compared to others, and relax their efforts?).
  - ZY 11 Nov 2024 18:30 UTC
    1 point
    0
    Parent
    True. I wonder for the average people, if being self-aware would at least unconsciously be a partial “blocker” on the next malevolence action they might do, and that may evolve across time too (even if it may take a bit longer than a mostly-good)