Morality is Scary
I’m worried that many AI alignment researchers and other LWers have a view of how human morality works, that really only applies to a small fraction of all humans (notably moral philosophers and themselves). In this view, people know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs. Here’s an example of someone who fits this view:
I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and play forward/rewind different counterfactual timelines (the ghost’s activity somehow doesn’t have any moral significance).
I sometimes treat such a ghost kind of like an idealized self. It can see much that I cannot. It can see directly what a small part of the world I truly am; what my actions truly mean. The lives of others are real and vivid for it, even when hazy and out of mind for me. I trust such a perspective a lot. If the ghost would say “don’t,” I’d be inclined to listen.
I’m currently reading The Status Game by Will Storr (highly recommended BTW), and found in it the following description of how morality works in most people, which matches my own understanding of history and my observations of humans around me:
The moral reality we live in is a virtue game. We use our displays of morality to manufacture status. It’s good that we do this. It’s functional. It’s why billionaires fund libraries, university scholarships and scientific endeavours; it’s why a study of 11,672 organ donations in the USA found only thirty-one were made anonymously. It’s why we feel good when we commit moral acts and thoughts privately and enjoy the approval of our imaginary audience. Virtue status is the bribe that nudges us into putting the interests of other people – principally our co-players – before our own.
We treat moral beliefs as if they’re universal and absolute: one study found people were more likely to believe God could change physical laws of the universe than he could moral ‘facts’. Such facts can seem to belong to the same category as objects in nature, as if they could be observed under microscopes or proven by mathematical formulae. If moral truth exists anywhere, it’s in our DNA: that ancient game-playing coding that evolved to nudge us into behaving co-operatively in hunter-gatherer groups. But these instructions – strive to appear virtuous; privilege your group over others – are few and vague and open to riotous differences in interpretation. All the rest is an act of shared imagination. It’s a dream we weave around a status game.
The dream shifts as we range across the continents. For the Malagasy people in Madagascar, it’s taboo to eat a blind hen, to dream about blood and to sleep facing westwards, as you’ll kick the sunrise. Adolescent boys of the Marind of South New Guinea are introduced to a culture of ‘institutionalised sodomy’ in which they sleep in the men’s house and absorb the sperm of their elders via anal copulation, making them stronger. Among the people of the Moose, teenage girls are abducted and forced to have sex with a married man, an act for which, writes psychologist Professor David Buss, ‘all concerned – including the girl – judge that her parents giving her to the man was a virtuous, generous act of gratitude’. As alien as these norms might seem, they’ll feel morally correct to most who play by them. They’re part of the dream of reality in which they exist, a dream that feels no less obvious and true to them than ours does to us.
Such ‘facts’ also change across time. We don’t have to travel back far to discover moral superstars holding moral views that would destroy them today. Feminist hero and birth control campaigner Marie Stopes, who was voted Woman of the Millennium by the readers of The Guardian and honoured on special Royal Mail stamps in 2008, was an anti-Semite and eugenicist who once wrote that ‘our race is weakened by an appallingly high percentage of unfit weaklings and diseased individuals’ and that ‘it is the urgent duty of the community to make parenthood impossible for those whose mental and physical conditions are such that there is well-nigh a certainty that their offspring must be physically and mentally tainted’. Meanwhile, Gandhi once explained his agitation against the British thusly: ‘Ours is one continual struggle against a degradation sought to be inflicted upon us by the Europeans, who desire to degrade us to the level of the raw Kaffir [black African] … whose sole ambition is to collect a certain number of cattle to buy a wife with and … pass his life in indolence and nakedness.’ Such statements seem obviously appalling. But there’s about as much sense in blaming Gandhi for not sharing our modern, Western views on race as there is in blaming the Vikings for not having Netflix. Moral ‘truths’ are acts of imagination. They’re ideas we play games with.
The dream feels so real. And yet it’s all conjured up by the game-making brain. The world around our bodies is chaotic, confusing and mostly unknowable. But the brain must make sense of it. It has to turn that blizzard of noise into a precise, colourful and detailed world it can predict and successfully interact with, such that it gets what it wants. When the brain discovers a game that seems to make sense of its felt reality and offer a pathway to rewards, it can embrace its rules and symbols with an ecstatic fervour. The noise is silenced! The chaos is tamed! We’ve found our story and the heroic role we’re going to play in it! We’ve learned the truth and the way – the meaning of life! It’s yams, it’s God, it’s money, it’s saving the world from evil big pHARMa. It’s not like a religious experience, it is a religious experience. It’s how the writer Arthur Koestler felt as a young man in 1931, joining the Communist Party:
‘To say that one had “seen the light” is a poor description of the mental rapture which only the convert knows (regardless of what faith he has been converted to). The new light seems to pour from all directions across the skull; the whole universe falls into pattern, like stray pieces of a jigsaw puzzle assembled by one magic stroke. There is now an answer to every question, doubts and conflicts are a matter of the tortured past – a past already remote, when one lived in dismal ignorance in the tasteless, colourless world of those who don’t know. Nothing henceforth can disturb the convert’s inner peace and serenity – except the occasional fear of losing faith again, losing thereby what alone makes life worth living, and falling back into the outer darkness, where there is wailing and gnashing of teeth.’
I hope this helps further explain why I think even solving (some versions of) the alignment problem probably won’t be enough to ensure a future that’s free from astronomical waste or astronomical suffering. A part of me is actually more scared of many futures in which “alignment is solved”, than a future where biological life is simply wiped out by a paperclip maximizer.
- A broad basin of attraction around human values? by 12 Apr 2022 5:15 UTC; 114 points) (
- Long Reflection Reading List by 24 Mar 2024 16:27 UTC; 92 points) (EA Forum;
- Voting Results for the 2021 Review by 1 Feb 2023 8:02 UTC; 66 points) (
- [Valence series] 4. Valence & Liking / Admiring by 10 Jun 2024 14:19 UTC; 48 points) (
- SBF’s comments on ethics are no surprise to virtue ethicists by 1 Dec 2022 4:18 UTC; 36 points) (
- 30 Dec 2023 16:41 UTC; 34 points) 's comment on AI alignment shouldn’t be conflated with AI moral achievement by (EA Forum;
- Is there a good place to find the “what we know so far” of the EA movement? by 29 Sep 2019 8:42 UTC; 25 points) (EA Forum;
- 15 Dec 2021 5:11 UTC; 20 points) 's comment on We’ll Always Have Crazy by (
- 1. A Sense of Fairness: Deconfusing Ethics by 17 Nov 2023 20:55 UTC; 16 points) (
- 27 Sep 2022 2:24 UTC; 14 points) 's comment on Ask Me Anything about parenting as an Effective Altruist by (EA Forum;
- 28 Oct 2023 2:34 UTC; 12 points) 's comment on Value systematization: how values become coherent (and misaligned) by (
- 2 Dec 2021 6:42 UTC; 12 points) 's comment on Tears Must Flow by (
- 6. The Mutable Values Problem in Value Learning and CEV by 4 Dec 2023 18:31 UTC; 12 points) (
- SBF’s comments on ethics are no surprise to virtue ethicists by 1 Dec 2022 4:21 UTC; 10 points) (EA Forum;
- 15 Jan 2024 15:58 UTC; 10 points) 's comment on AI doing philosophy = AI generating hands? by (EA Forum;
- 3 Dec 2023 9:00 UTC; 9 points) 's comment on Doing Good Effectively is Unusual by (EA Forum;
- 2 Dec 2023 0:26 UTC; 9 points) 's comment on Doing Good Effectively is Unusual by (EA Forum;
- 8 Mar 2024 5:23 UTC; 7 points) 's comment on Social status part 2/2: everything else by (
- 15 Dec 2021 1:25 UTC; 7 points) 's comment on Ngo and Yudkowsky on alignment difficulty by (
- 2 Feb 2024 2:56 UTC; 7 points) 's comment on Managing risks while trying to do good by (
- 8 Jan 2022 22:36 UTC; 6 points) 's comment on General alignment plus human values, or alignment via human values? by (
- A tentative dialogue with a Friendly-boxed-super-AGI on brain uploads by 12 May 2022 21:55 UTC; 5 points) (EA Forum;
- 18 Oct 2022 7:09 UTC; 5 points) 's comment on Decision theory does not imply that we get to have nice things by (
- 17 Apr 2024 6:14 UTC; 4 points) 's comment on The argument for near-term human disempowerment through AI by (EA Forum;
- 17 Sep 2022 1:17 UTC; 3 points) 's comment on Levels of goals and alignment by (
- 9 Jan 2022 19:01 UTC; 2 points) 's comment on General alignment plus human values, or alignment via human values? by (
- A tentative dialogue with a Friendly-boxed-super-AGI on brain uploads by 12 May 2022 19:40 UTC; 1 point) (
I think this post makes an important point—or rather, raises a very important question, with some vivid examples to get you started. On the other hand, I feel like it doesn’t go further, and probably should have—I wish it e.g. sketched a concrete scenario in which the future is dystopian not because we failed to make our AGIs “moral” but because we succeeded, or e.g. got a bit more formal and complemented the quotes with a toy model (inspired by the quotes) of how moral deliberation in a society might work, under post-AGI-alignment conditions, and how that could systematically lead to dystopia unless we manage to be foresightful and set up the social conditions just right.
I recommend not including this post, and instead including this one and Wei Dai’s exchange in the comments.