The Social Alignment Problem
TLDR: I think public outreach is a very hopeful path to victory. More importantly, extended large-scale conversation around questions such as whether public outreach is a hopeful path to victory would be very likely to decrease p(doom).
You’re a genius mechanical engineer in prison. You stumble across a huge bomb rigged to blow in a random supply closet. You shout for two guards passing by, but they laugh you off. You decide to try to defuse it yourself.
This is arguably a reasonable response, given that this is your exact skill set. This is what you were trained for. But after a few hours of fiddling around with the bomb, you start to realize that it’s much more complicated than you thought. You have no idea when it’s going to go off, but you start despairing that you can defuse it on your own. You sink to the floor with your face in your hands. You can’t figure it out. Nobody will listen to you.
Real Talking To The Public has never been tried
Much like the general public has done with the subject of longevity, I think many people in our circle have adopted an assumption of hopelessness toward public outreach social alignment, before a relevant amount of effort has been expended. In truth, there are many reasons to expect this strategy to be quite realistic, and very positively impactful too. A world in which the cause of AI safety is as trendy as the cause of climate change, and in which society is as knowledgeable about questions of alignment as it it about vaccine efficacy (meaning not even that knowledgeable), is one where sane legislation designed to slow capabilities and invest in alignment becomes probable, and where capabilities research is stigmatized and labs find access to talent and resources harder to come by.
I’ve finally started to see individual actors taking steps towards this goal, but I’ve seen a shockingly small amount of coordinated discussion about it. When the topic is raised, there are four common objections: They Won’t Listen, Don’t Cry Wolf, Don’t Annoy the Labs, and Don’t Create More Disaster Monkeys.
They won’t listen/They won’t understand
I cannot overstate how clearly utterly false this is at this point.
It’s understandable that this has been our default belief. I think debating e/accs on Twitter has broken our brains. The experience of explaining again and again why something smarter than you that doesn’t care about you is dangerous, and being met with these arguments, is a soul-crushing experience. It made sense to expect that if it’s this hard to explain to a fellow computer enthusiast, then there’s no hope of reaching the average person. For a long time I avoided talking about it with my non-tech friends (let’s call them “civilians”) for that reason. However, when I finally did, it felt like the breath of life. My hopelessness broke, because they instantly vigorously agreed, even finishing some of my arguments for me. Every single AI safety enthusiast I’ve spoken with who has engaged with civilians has had the exact same experience. I think it would be very healthy for anyone who is still pessimistic about convincing people to just try talking to one non-tech person in their life about this. It’s an instant shot of hope.
The truth is, if we were to decide that getting the public on our side is our goal, I think we would have one of the easiest jobs any activists social alignment researchers have ever had.
Far from being closed to the idea, civilians in general literally already get it. It turns out, Terminator and the Matrix have been in their minds this whole time. We assumed they’d been inoculated against serious AI risk concern—turns out, they walked out of the theaters thinking “wow, that’ll probably happen someday”. They’ve been thinking that the entire time we’ve been agonizing about nobody understanding us. And now, ChatGPT has taken that “someday” and made it feel real.
At this point AI optimists are like the Black Knight from Monty Python. You can slice apart as many of their arguments as you want but they can’t be killed – however, you can just go around them. We’re spending all our time and effort debating them and getting nowhere, when we could just go around them to the hosts of civilians perfectly willing to listen.
The belief is already there. They just haven’t internalized it, like a casual Christian casually sinning even though their official internal belief is that they’re risking being tortured literally forever. They just need the alief.
A month ago, there had only been a handful of attempts at social alignment from us. Rob Miles has been producing accessible, high-quality content for half a decade. A petition was floated to shut down Bing, which we downvoted into oblivion. There was the Bankless podcast. There was the 6-month open letter, and then the Time opinion piece and several podcast appearances. This wasn’t that much effort as PR pushes go, and yet it accomplished a very appreciable news cycle that hasn’t yet ended (although there were unforced errors in messaging that more coordination likely could have avoided).
Additionally, it seems to me that the incentives of almost all relevant players already align with being open to the message of slowing progress (beyond the free-bingo-square incentive of not wanting to die).
Governments are eternally paranoid of any threats to their power. They have a monopoly on violence, and it shouldn’t take a five star general to realize that a person, company, or state armed with a superhumanly intelligent adviser is one of the only realistic threats they face. It’s an obvious national security risk. They’re also motivated to follow the will of the people.
Huge numbers of civilians are currently in extreme danger of their jobs being abstracted away to language model x, let alone if capabilities continue progressing as they have been. This wave of automation will be unique because instead of low-income workers it will be the ones with the most money to contribute to political campaigns. There will be a short delay, but the looming threat alone should get people riled up in a very rare way, not to speak of when it actually starts happening in earnest.
Legacy companies without an AI lead are standing on the precipice of being disrupted out of existence. The climate change cause fought against trillions of dollars, because they were trying to change the status quo, a status quo that at the time made up all the world’s most valuable companies. Here, we’re more accurately said to be working to prevent the status quo from changing, meaning it seemse there’s more likely to be lobby-ready money on our side than theirs. There will be plenty of money on the other side also but I expect the situation to be an inversion of climate change.
(Tangent: I think it’s worth mentioning here that stigmatization also seems very relevant to the problem of Chinese AI enthusiasm. China has invested many resources into mitigating climate change risk, in order to improve its global reputation. A future where AI capabilities research carries a heavy moral stigma globally and China decides to disinvest as a result isn’t entirely unrealistic. China has the additional incentive here that American companies are clearly ahead, and a global pause would benefit China, just as it would benefit smaller companies wanting a chance to catch up. China would then be incentivized to avoid disincentivizing an American pause.)
Don’t cry wolf/Preserve dry powder
The question of whether now is the time to seriously go public is a valid one. But the question assumes that at some point in the future it will be the correct time. This almost mirrors the AI risk debate itself: even if the crucial moment is in the future, it doesn’t make sense to wait until then to start preparing. A public-facing campaign can take months to plan and hone, and it seems like it makes sense to start preparing one now, even if we decide that now isn’t the correct moment.
We need to avoid angering the labs
Practically speaking I’ve seen no evidence that the very few safety measures labs have taken have been for our benefit. Possibly, to some small extent, they’ve been for PR points because of public concern we’ve raised, but certainly not out of any loyalty or affection for us. The opportunity to regulate them or impose bottlenecks on access to talent and resources via stigmatization of the field of capabilities research seems much larger than the expected benefit of hoping that they’ll hold back because we’ve been polite.
Don’t create more idiot disaster monkeys
It’s true that we’re mostly in this situation because certain people heard about the arguments for risk and either came up with terrible solutions to them or smelled a potent fount of personal power. A very valid concern I’ve heard raised is that something similar could happen with governments, which would be an even worse situation than the one we’re in.
It seems unlikely that AI capabilities can advance much further without governments and other parties taking notice of their potential. If we could have a choice between them realizing the potential without hearing about the risks, or realizing the potential via hearing about the risks, the latter seems preferable. The more the public is convinced of the risk, the more incentivized governments are to act as though they are, too. Additionally, there doesn’t seem to be an alternative. Unaligned superintelligence approaches by default unless something changes.
Without concerted effort from us, there are two possible outcomes. Either the current news cycle fizzles out like the last ones did, or AI risk goes truly mainstream but we lose all control over the dialogue. If it fizzles out, there’s always a chance to start another one after the next generation of AI and another doom-dice roll, assuming we won’t just say the same thing then. But even then, much of our dry powder will be gone and our time much shorter. It’s hard to say how bad losing control over the dialogue could be; I don’t know how asinine the debate around this could get. But if we believe that our thinking about this topic tends to be more correct than the average person, retaining control over it should have a positive expected value.
Realistically, the latter failure appears much much more likely. I’m fairly certain that this movement is in the process taking off with or without us. There are a few groups already forming that are largely unaffiliated with EA/rationalism but are very enthusiastic. They’ve mostly heard of the problem through us, but they’re inviting people who haven’t, who will invite more people who haven’t. I’ve started to see individuals scared out of all reason, sounding more and more unhinged, because they have no guidance and nowhere to get it, at least until they find these groups. A very realistic possible future includes a large AI safety movement that we have no influence over, doing things we would never have sanctioned for goals we disagree with. Losing the ability to influence something once it gets sufficiently more powerful than you; why does that sound familiar?
My Bigger Point: We Lack Coordination
You probably disagree with many things I’ve said, which brings me to my main point: questions like these haven’t been discussed enough for there to be much prior material to reference, let alone consensuses reached. I could be wrong about a lot of what I suggested; maybe going public is the wrong move, or maybe now isn’t the right time; but I wouldn’t know it because there is no extended conversation around real-world strategy. The point has been raised a couple times before that actions taken by individuals in our circle have been very uncoordinated. Every time this is raised, some people agree and a handful of comment chains are written, but then the conversation fizzles out and nothing results.
One very annoying practical consequence of this lack of coordination is that I never have any idea what prominent figures like Eliezer are thinking. It would have been extremely useful for example to know how his meeting with Sam Altman had gone, or if he considers the famous tweet to be as indicative of personal cruelty as it seems, but I had to watch a podcast for the former and still don’t know the latter. It would have been useful for his TIME article to have been proof-read by many people. It would currently be extremely useful to know what if any dialogue he’s having with Elon Musk (probably none, but if he is, this changes the gameboard). I’m not wishing I could personally ask these questions; I’m wishing there were public record of somebody asking him, after deeming them important datapoints for strategy. In general there seems to be no good way to cooperate with AI safety leadership.
I don’t like saying the phrase “we should”, but it is my strong belief that a universe in which a sizable portion of our dialogue and efforts is dedicated to ongoing, coordinated real-world strategizing is ceteris paribus much safer. It seems clear that this will be the case at some point. Even most outreach-skeptics say only that now is too soon. But starting now can do nothing but maximize time available.
To avoid passing the buck and simply hoping this time is different, I’ve set up the subreddit r/AISafetyStrategy to serve as a dedicated extended conversation about strategy for now, funded it with $1000 for operations, and am building a dedicated forum to replace it with. I realize unilateral action like this is considered a little gauche on here. To be clear, I think these actions are very suboptimal – I would much prefer something with equivalent function to be set up with the approval and input of everyone here, and I hope something is created that supercedes my thing. Even simply adding a “strategy” tag to LessWrong would probably be better. But until that something better, feel free to join and contribute your strategy questions and ideas.
- Why was the AI Alignment community so unprepared for this moment? by 15 Jul 2023 0:26 UTC; 120 points) (
- AGI rising: why we are in a new era of acute risk and increasing public awareness, and what to do now by 2 May 2023 10:17 UTC; 68 points) (EA Forum;
- AGI rising: why we are in a new era of acute risk and increasing public awareness, and what to do now by 3 May 2023 20:26 UTC; 23 points) (
- 20 Feb 2024 23:00 UTC; 3 points) 's comment on Petition—Unplug The Evil AI Right Now by (
An important dimension of previous social movements involved giving concerned people concrete things to do so that emotional energy doesn’t get wasted and/or harm the person.
Well, the bomb story was a good hook, but it doesn’t seem to fit the rest very well. There are good points in here, so I weak-upvoted anyway.
Re cry wolf/dry powder, if there’s one thing I learned from our COVID experience, it’s that the timing of exponential changes are hard to call and reason about. Even when you’re right, you’re either going to be too early, or too late. Too late is unacceptable in this case. ChatGPT made the mainstream finally take notice. The time to act is now.
The prisoner/bomb analogy is not useful, for the reason that there’s zero chance the prisoner is incorrect. The bomb is definitely there, and it’s just a matter of getting a guard to look, not a complicated multi-step set of actions that may or may not delay a maybe-fictional-or-dud bomb.
I’m deeply suspicious of “we should” (as you acknowledge you are), and I probably won’t put much effort into this, but I want to give you full credit and honor for actual attempts to improve things. The opposite of your apology for unilateral action—you deserve praise for action rather than discussion. All real, action is unilateral to some extent—it cannot start as a full consensus of humanity, and it’s probably not going to get there ever.
Wait, reverse the sequence of my pessimism and my praise. Good job, whether it works or not!
Yes, I should have been more clear that I was addressing people who have very high p(doom). The prisoner/bomb is indeed somewhat of a simplification, but I do think there’s a valid connection in the form of half-heartedly attempting to get the assistance of people more powerful than you and prematurely giving it up as hopeless.
Thank you for your kind words! I was expecting most reactions to be fairly anti-”we should”, but I figured it was worth a try.
I couldn’t agree more. Count me in. I’ll join the subreddit.
My one thought is that a good PR strategy should include ways to not create polarization on the issue.
I know you saw my post AI scares and changing public beliefs but I thought I’d link it here as a few thoughts on the polarization issue.
I’ve experienced this, but also seen people dependent on chatgpt.
I’ve seen the latter but much more of the former.
On the subject of losing control of the discourse, this tumblr post on the development of traditional social movements seems to have some relevant insights. (this is not to say it’s 1:1 applicable) https://balioc.tumblr.com/post/187004568356/your-ideology-if-it-gets-off-the-ground-at-all
(Disclaimer: I’m newer to the alignment problem than most here, I’m not an experienced researcher, just sharing this in case it helps)
I’ve often heard and seen the divide people are talking about where people involved in the tech world are a lot more skeptical than your average non-tech friend. I’m curious what people think is the reason for this. Is the main claim that people in tech have been lulled into a false sense of security by familiarity? Or perhaps that they look down on safety concerns as coming from a lay audience scared by Terminator or vague sci fi ideas without understanding the technology deeply enough?
I honestly can’t say. I wish I could.
A little new to the AI Alignment Field building effort, would you put head researchers at OpenAI in this category?
Hmm, not necessarily the researchers, but the founders undoubtedly. OpenAI was specifically formed to increase AI safety.
Good article, I agree that we definitely need to try now and its likely that if we don’t another group will take over the narrative.
I also think that it is important for people to know what they are working towards as well as away from. Imagining what a positive Singularity for them personally is like is something I think the general public should also start doing. Positive visions inspire people, we know that. To me its obvious that such a future would involve different groups with different values somewhat going their own ways. Thinking about it, that is about the only thing I can be sure of. Some people will obviously be much more enthusiastic for biological/tech enhancement than others, and of course living of earth. we agree that coherent extrapolated volition is important, its time we thought a bit about what its details are.