I agree that the problem of “evil” is multifactorial with individual personality traits being only one of several relevant factors, with others like “evil/fanatical ideologies” or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.
It seems to me that most people become much more evil when they aren’t punished for it. [...] So if we teach AIs to be as “aligned” as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history—which is to say, not very well.
Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section).
I’m afraid in a situation of power imbalance these interpersonal differences won’t matter much. I’m thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn’t stop it.
To be clear, I’m not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they’re very similar to humans) will just systemically treat us badly.
But even in colonialism, individual traits played a role. For example, compare King Leopold II’s rule over the Congo Free State vs. other colonial regimes.
While all colonialism was exploitative, under Leopold’s personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.
Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms.
One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners).
In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.[1]
Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism.
The British weren’t much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.
Colonialism didn’t end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.
So I guess the difference between your view and mine is that I think colonialism kept going basically as long as it benefited the dominant group. Benevolence or malevolence didn’t come into it much. And if we get back to the AI conversation, my view is that when AIs become more powerful than people and can use resources more efficiently, the systemic gradient in favor of taking everything away from people will be just way too strong. It’s a force acting above the level of individuals (hmm, individual AIs) - it will affect which AIs get created and which ones succeed.
I don’t have enough data about it, but I think it is possible that these horrible mass behaviors start by some dark individuals doing it first… and others gradually joining them after observing that the behavior wasn’t punished, and maybe that they kinda need to do the same thing in order to remain competitive.
In other words, the average person is quite happy to join some evil behavior that is socially approved, but there are individuals who are quite happy to initiate it. Removing those individuals from the positions of power could stop many such avalanches.
(In my model, the average person is kinda amoral—happy to copy most behaviors of their neighbors, good and bad alike—and then we have small fractions of genuinely good and genuinely bad people, who act outside the Overton window; plus we can make the society better or worse by incentives and propaganda. For example, punishing bad behavior will deter most people, and stories about heroes will inspire some.)
EDIT:
For example, you mention colonialism. Maybe most people approved of it, but only some of them made the decisions and organized it. Remove the organizers, and there is no colonialism. More importantly, I think that most people approved of having the colonies simply because it was the status quo. The average person’s moral compass could probably be best described as “don’t do weird things”.
I think a big part of the problem is that in a situation of power imbalance, there’s a large reward lying around for someone to do bad things—plunder colonies for gold, slaves, and territory; raise and slaughter animals in factory farms—as long as the rest can enjoy the fruits of it without feeling personally responsible. There’s no comparable gradient in favor of good things (“good” is often unselfish, uncompetitive, unprofitable).
In theory, the reward for doing good should be prestige. (Which in turn may translate to more tangible rewards.) But that mostly works in small groups and doesn’t scale well.
Some aspect of this seems like a coordination problem. Whatever is your personal definition of “good”, you would probably approve of a system that gives good people some kind of prestige, at least among other good people.
For example, people may disagree about whether veganism is good or bad, but from a perspective of a vegan, it would be nice if vegans could have some magical “vegan mark” that would be unfalsifiable and immediately visible to other vegans. That way, you could promote your values not just by practicing and preaching your values, but also by rewarding other people who practice the same values. (For example, if you sell some products, you could give discounts to vegans. If many people start doing that, veganism may become more popular. Perhaps some people would criticize that as doing things for the wrong reasons, but the animals probably wouldn’t mind.) Similarly, effective altruists would approve of rewarding effective altruists, open source developers would approve of rewarding open source developers, etc.
These things exist to some degree (e.g. the open source developers can put a link to their projects in a profile), but often the existing solutions don’t scale well. If you only have dozen effective altruists, they know each other by name, but if you get thousands, this stops working.
One problem here is the association of “good” with “unselfish” and “non-judgmental”, which suggests that good people rewarding other good people is somehow… bad? In my opinion, we need to rethink that, because from the perspective of incentives and reinforcement, that is utterly stupid. The reason for these memes is that the past attempts to reward good often led to… people optimizing to be seen as good, rather than actually being good. That is a serious problem that I don’t know how to solve; I just have a strong feeling that going to the opposite extreme is not the right answer.
I agree that the problem of “evil” is multifactorial with individual personality traits being only one of several relevant factors, with others like “evil/fanatical ideologies” or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.
Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section).
I’m afraid in a situation of power imbalance these interpersonal differences won’t matter much. I’m thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn’t stop it.
To be clear, I’m not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they’re very similar to humans) will just systemically treat us badly.
Thanks, I mostly agree.
But even in colonialism, individual traits played a role. For example, compare King Leopold II’s rule over the Congo Free State vs. other colonial regimes.
While all colonialism was exploitative, under Leopold’s personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.
Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms.
One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners).
In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.[1]
Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism.
The British weren’t much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.
Colonialism didn’t end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.
So I guess the difference between your view and mine is that I think colonialism kept going basically as long as it benefited the dominant group. Benevolence or malevolence didn’t come into it much. And if we get back to the AI conversation, my view is that when AIs become more powerful than people and can use resources more efficiently, the systemic gradient in favor of taking everything away from people will be just way too strong. It’s a force acting above the level of individuals (hmm, individual AIs) - it will affect which AIs get created and which ones succeed.
I don’t have enough data about it, but I think it is possible that these horrible mass behaviors start by some dark individuals doing it first… and others gradually joining them after observing that the behavior wasn’t punished, and maybe that they kinda need to do the same thing in order to remain competitive.
In other words, the average person is quite happy to join some evil behavior that is socially approved, but there are individuals who are quite happy to initiate it. Removing those individuals from the positions of power could stop many such avalanches.
(In my model, the average person is kinda amoral—happy to copy most behaviors of their neighbors, good and bad alike—and then we have small fractions of genuinely good and genuinely bad people, who act outside the Overton window; plus we can make the society better or worse by incentives and propaganda. For example, punishing bad behavior will deter most people, and stories about heroes will inspire some.)
EDIT:
For example, you mention colonialism. Maybe most people approved of it, but only some of them made the decisions and organized it. Remove the organizers, and there is no colonialism. More importantly, I think that most people approved of having the colonies simply because it was the status quo. The average person’s moral compass could probably be best described as “don’t do weird things”.
I think a big part of the problem is that in a situation of power imbalance, there’s a large reward lying around for someone to do bad things—plunder colonies for gold, slaves, and territory; raise and slaughter animals in factory farms—as long as the rest can enjoy the fruits of it without feeling personally responsible. There’s no comparable gradient in favor of good things (“good” is often unselfish, uncompetitive, unprofitable).
In theory, the reward for doing good should be prestige. (Which in turn may translate to more tangible rewards.) But that mostly works in small groups and doesn’t scale well.
Some aspect of this seems like a coordination problem. Whatever is your personal definition of “good”, you would probably approve of a system that gives good people some kind of prestige, at least among other good people.
For example, people may disagree about whether veganism is good or bad, but from a perspective of a vegan, it would be nice if vegans could have some magical “vegan mark” that would be unfalsifiable and immediately visible to other vegans. That way, you could promote your values not just by practicing and preaching your values, but also by rewarding other people who practice the same values. (For example, if you sell some products, you could give discounts to vegans. If many people start doing that, veganism may become more popular. Perhaps some people would criticize that as doing things for the wrong reasons, but the animals probably wouldn’t mind.) Similarly, effective altruists would approve of rewarding effective altruists, open source developers would approve of rewarding open source developers, etc.
These things exist to some degree (e.g. the open source developers can put a link to their projects in a profile), but often the existing solutions don’t scale well. If you only have dozen effective altruists, they know each other by name, but if you get thousands, this stops working.
One problem here is the association of “good” with “unselfish” and “non-judgmental”, which suggests that good people rewarding other good people is somehow… bad? In my opinion, we need to rethink that, because from the perspective of incentives and reinforcement, that is utterly stupid. The reason for these memes is that the past attempts to reward good often led to… people optimizing to be seen as good, rather than actually being good. That is a serious problem that I don’t know how to solve; I just have a strong feeling that going to the opposite extreme is not the right answer.