Slowing down AI progress is an underexplored alignment strategy
The emotional burden of watching the world end
My current beliefs about AI timelines have made my life significantly worse. I find thoughts about the ever-shrinking timeline to AGI invading nearly every aspect of my life. Every choice now seems to be affected by the trajectory of this horrible technology.
Based on the responses in this post and others, many if not most people on this forum have been affected in a similar way. AI is the black hole swallowing all other considerations about the future.
Frankly, I feel a good deal of anger about the current state of affairs. All these otherwise thoughtful and nice people are working to build potentially world-ending technology as fast as possible. I’m angry so few people are paying any attention to the danger. I’m angry that invasive thoughts about AGI destroying the world are making it harder for me to focus on work that might have a non-zero chance of saving it. And I’m frustrated we’ve been so closed-minded in our approach to solving this problem.
The current situation is almost exactly analogous to the creation of the atomic bomb during World War 2. There’s a bunch of very smart people gathered together, supplied with a fuckton of money by powerful groups, working towards the same destructive goal. There is social proof all around them that they are doing the right thing. There are superficially plausible reasons to think that everything will turn out fine. There is token engagement with the concerns raised by people concerned about the implications of the technology. But at the end of the day, the combination of personal incentives, social proof and outside impetus make everyone turn their head and ignore the danger. On the rare occasion that people are convinced to leave, it’s almost always the most conscientious, most cautious people, ensuring the remaining team is even less careful.
There is so much magical thinking going on among otherwise intelligent people. Everyone seems to be operating on the assumption that no technology can destroy us, that everything will magically turn out fine, despite the numerous historical examples of new inventions destroying or nearly destroying the world (see the Cuban Missile Crisis or the great oxidation event when oxygen-producing bacteria extincted themselves and most other life on earth and caused a 300 million year ice age).
Frankly I’ve found the response from the EA/rationalist community has been pretty lackluster so far. Every serious solution that has been proposed revolves around solving alignment before we make AGI, yet I know ZERO people who are working on slowing down capabilities progress. Hell, until just a month ago, EA orgs like 80,000 hours were RECOMMENDING people join AI research labs and work on CREATING superintelligence.
The justifications I’ve read for this behavior always seem to be along the lines of “we don’t want to alienate the people working at top AI orgs because we feel that will be counter-productive to our goals of convincing them that AI alignment is important.” Where has this strategy gotten us? Does the current strategy of getting a couple of members of the EA/Rationalist community onto the safety teams at major AI orgs actually have a chance at working? And is it worth foregoing all efforts to slow down progress towards AGI?
The goals of DeepMind, OpenAI, and all the other top research labs are fundamentally opposed to the goal of alignment. The founding goal of Deepmind is to “solve intelligence, then use that to solve everything else.” That mission statement has been operationalized as paying a bunch of extremely smart people ridiculous salaries to create and distribute the blueprints for (potentially) world-ending AGI as fast as humanly possible. That goal is fundamentally opposed to the goal of alignment because it burns the one common resources that all alignment efforts need to make a solution work: *time*.
We need to buy more time
In the latest Metaculus forecasts, we have 13 years left until some lab somewhere creates AGI, and perhaps far less than that until the blueprints to create it are published and nothing short of a full-scale nuclear war will stop someone somewhere from doing so. The community strategy (insofar as there even is one) is to bet everything on getting a couple of technical alignment folks onto the team at top research labs in the hopes that they will miraculously solve alignment before the mad scientists in the office next door turn on the doomsday machine.
While I admit there is at least a chance this might work, and it IS worth doing technical alignment research, the indications we have so far from the most respected people in the field are that this is an extremely hard problem and there is at least a non-zero chance it is fundamentally unsolvable.
There are a dozen other strategies we could potentially deploy to achieve alignment, but they all depend on someone not turning on the doomsday machine. But thus far we have almost completely ignored the class of strategies that might buy more time. The cutting edge of thought on this front seems to come from [one grumpy former EA founder on Twitter](https://twitter.com/KerryLVaughan) who isn’t even trying that hard.
Slow down AI with stupid regulations
We have a dozen examples of burdensome regulation stifling innovation and significantly slowing or even reversing progress in fields. In the US, we’ve managed to drive birth rates to below replacement levels and homelessness to record highs with literally nothing more than a few thousand grumpy NIMBYs showing up at city council meetings and lobbying for restrictive zoning and stupid building codes. We’ve managed to significantly erode the capabilities of the US military and stunt progress in the field just by guaranteeing contractors a fixed percentage profit margin on top of their costs. These same contracts led to the great stagnation in the aerospace industry that ensured we haven’t returned to the moon for over 50 years and lost the ability to reach low earth orbit for a decade.
Hell, Germany just shut down their remaining nuclear power plants during the middle of an energy crisis because of a bunch of misguided idiots from the green party think nuclear power is unsafe. They managed to convince government officials to shut down operational nuclear plants and replace the with coal-fired power plants using coal [SUPPLIED BY RUSSIA.](https://www.nytimes.com/2022/04/05/business/germany-russia-oil-gas-coal.html)
Modern societies THRIVE at producing burdensome regulation, even in cases where it’s counter to nearly everyone’s short-term interest. Yet this approach to buying more time has been basically ignored. Why? How many years of time are we losing by foregoing this approach?
I think a basic plan for slowing down AI alignment work would look something like this:
Lobby government officials to create a new committee on “AI bias and public welfare”. (creating committees that increase bureaucracy in response to public concern is a favorite pastime of congress). Task this committee with approving the deployment of new machine learning models. Require that all new model deployments (defined in a way so as to include basically all state-of-the-art models) be approved by the committee in the same way that electronic medical records systems have to be approved as “HIPAA compliant”
Conduct a public relations campaign to spread awareness of the “bias and danger” created by AI systems. Bring up job loss created by AI. Do a 60 minutes interview with the family of the guy who lost his job to AI and turned to fentanyl and became the latest statistic in the rising cases of deaths of despair. Talk about how racist and biased AI systems are, and how companies can’t be trusted to use them in the public interest. Use easy concrete examples of harm that has already been done, and spread fear about the increasing incidence of this type of harm. Find people who have been personally hurt by AI systems and have them testify in front of lawmakers.
Institute an onerous IRB approval process for academic publications on AI where researchers have to demonstrate that their system won’t cause harm. Add new requirements every time something goes wrong in a way that embarrassed the university. Publicly shame universities and funding orgs that don’t follow this process, and accuse them of funding research that allows racism/sexism/inequity to persist.
Hire think tanks to write white papers about the harm caused by poorly designed, quickly deployed AI systems. Share these with congressional staffers. Emphasize the harm done to groups/things you know that representative already cares about.
Take advantage of the inevitable fuck-ups and disasters caused by narrow AI to press leaders for tighter regulations and more bureaucracy
Recruit more EAs from China to join this project there (particularly those with high-level connections in the CCP)
If we can get this kind of legislation passed in the US, which is probably the world leader in terms of AI, I think it will be significantly easier for other countries to follow suit. One of my biggest takeaways from the COVID pandemic is world leaders have a strong herd mentality. They tend to mimic the behavior and regulations of other countries, even in cases where doing so will in expectation lead to tens of thousands of their citizens dying.
I think we have a far easier case to make for regulating AI than, say, preventing challenge trials from taking place or forcing COVID vaccines to go through a year-long approval process while hundreds of thousands of people died.
I’d appreciate other people’s thoughts on this plan, particularly people who work in government or politics.
I’ve thought a bit about ideas like this, and talked to much smarter people than myself about such ideas—and they usually dismiss them, which I take as a strong signal this may be a misguided idea.
I think the Machiavellian model of politics is largely correct—and it just is the case that if you look closely at any great change in policy you see, beneath the idealized narrative, a small coterie of very smart ideologues engaging in Machiavellian politics.
To the extent overt political power is necessary for EA causes to succeed, Machiavellian politics will be necessary and good. However, this sort of duplicitous regulatory judo you advocate strikes me as possibly backfiring—by politicizing AI in this way those who are working on the actually important AI safety research become very tempting targets to the mechanisms you hope to summon. We see hints of this already.
To the extent it is possible to get people with correct understanding of the actually important problem in positions of bureaucratic and moral authority, this seems really, really good. Machiavellian politics will be required to do this. Such people may indeed need to lie about their motivations. And perhaps they may find it necessary to manipulate the population in the way you describe.
However, if you don’t have such people actually in charge and use judo mind tricks to manipulate existing authorities to bring AI further into the culture war, you are summoning a beast you, by definition, lack the power to tame.
I suspect it would backfire horribly, incentivize safety washing of various kinds in existing organizations who are best positioned to shape regulation, making new alignment orgs like Conjecture and Redwood very difficult to start, and worst of all making overtly caring about the actual problem very politically difficult
> I’ve thought a bit about ideas like this, and talked to much smarter people than myself about such ideas—and they usually dismiss them, which I take as a strong signal this may be a misguided idea.
I honestly don’t know whether slowing down AI progress in these ways is/isn’t a good idea. It seems plausibly good to me. I do think I disagree about whether the “much smarter people”s dismissal of these ideas is a strong signal.
Why I disagree about the strong signal thing:
I had to push through some fear as I wrote the sentence about it seeming plausibly good to me, because as I wrote it I imagined a bunch of potential conflict between e.g. AI safety folks, and AI folks. For example, a few months ago I asked a safety researcher within a major AI lab what they thought of e.g. people saying they thought it was bad to do/speed AI research. They gave me an expression I interpreted as fear, swallowed, and said something like: gosh, it was hard to think about because it might lead to conflict between them and their colleagues at the AI lab.
At least one person well-versed in AI safety, who I personally think non-stupid about policy also, has told me privately that they think it’s probably helpful if people-at-large decide to try to slow AI or to talk about AI being scary, but that it seems disadvantageous (for a number of good, altruistic reasons) for them personally to do it.
Basically, it seems plausible to me that:
1. There’s a “silence of elites” on the topic of “maybe we should try by legal and non-violent means to slow AI progress, e.g. by noticing aloud that maybe it is anti-social for people to be hurtling toward the ability to kill everyone,”
2. Lots of people (such as Trevor1 in the parent comment) interpret this silence as evidence that the strategy
3. But it is actually mostly evidence that many elites are in local contexts where their personal ability to do the specific work they are trying to do would be harmed by them saying such things aloud, plus social contagion/mimicry of such views.
I am *not* sure the above 1-3 is the case. I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example. But I think the above is plausible enough that I’d recommend being pretty careful not to interpret elite silence as necessarily meaning there’s a solid case against “slow down AI,” and pushing instead for inside-view arguments that actually make sense to you, or to others who you think are good thinkers and who are not politically entangled with AI or AI safety.
When I try it myself on an inside view, I see things pointing in multiple directions. Would love to see LW try to hash it out.
All of this makes sense, and I do agree that it’s worth consideration (I quadruple upvoted the check mark on your comment). Mainly in-person conversations, since the absolute worst case scenario with in-person conversations is that new people learn a ton of really good information about the nitty-gritty problems with mass public outreach; such as international affairs. I don’t know if there’s a knowable upper bound on how wayward/compromised/radicalized this discussion could get if such discussion takes place predominantly on the internet.
I’d also like to clarify that I’m not “interpreting this silence as evidence”, I’ve talked to AI policy people, and I also am one, and I understand the details of why we reflexively shoot down the idea of mass public outreach. It all boils down to ludicrously powerful, territorial, invisible people with vested interests in AI, and zero awareness of what AGI is or why it might be important (for the time being).
It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferent about AI timelines, conditional on human-level AGI in 2050, you’d have to expect a 1⁄5 probability of existential disaster from other causes in the 2030-2050 period. (That way, with human-level AGI in 2050, you’d have a 1⁄2 * 4⁄5 = 40% chance of surviving, just like with human-level AGI in 2030.) I don’t really know of non-AI risks in the ballpark of 10% per decade.
(My guess at MIRI people’s model is more like 99% chance of existential disaster from human-level AGI in 2030 and 90% in 2050, in which case indifference would require a 90% chance of some other existential disaster in 2030-2050, to cut 10% chance of survival down to 1%.)
(Note, this post seems to have been originally published roughly a year ago, and so my reply here might be responding to an out-of-date argument.)
I guess I’m just much more optimistic than most people here that slowing down AI via regulations is the default, reasonable path that we should expect society to take in the absence of further interventions from longtermists. The reason for my belief has perhaps become more apparent in the last six months, but it’s worth repeating. My argument is built on a few simple premises:
Most people aren’t excited about the idea of transformative AI radically upending human life, and so will demand regulations if they think that transformative AI is imminent. I expect regulations to focus on mitigating harms from job losses, making sure the systems are reliable, and ensuring that powerful models can’t be exploited by terrorists.
AI will get gradually more powerful over the course of years, rather than being hidden in the background until suddenly godlike AI appears and reshapes the world. In these years, the technology will be rolled out on a large scale, resulting in most people (especially tech-focused young people) recognizing that these technologies are becoming powerful.
AIs will be relatively easy to regulate in the short-term, since AI progress is currently largely maintained by scaling compute budgets, and large AI supercomputers are easy to monitor. GPU production is highly centralized, and it is almost trivial for governments to limit production, which would raise prices, delaying AI.
Given that we’re probably going to get regulations by default, I care much more about ensuring that our regulations are thoughtful and well-targeted. I’m averse to trying to push for any regulations whatsoever because that’s “the best hope we have”. I don’t think stupid regulations are the best hope we have, and moreover, locking in stupid regulations could make the situation even worse!
{{Blockquote | text=Recruit more EAs from China to join this project there (particularly those with high-level connections in the CCP)}}
I straight up don’t believe this is going to work. Trying to influence the Chinese government into restricting their own access to technology that is of the utmost geopolitical importance seems about as hard as AI alignment to me. I don’t want to say it’s impossible, but it’s close. Without absolute international cooperation on the issue we’re just shifting the problem into countries and research communities that are intransparent to almost all the leading AI alignment researchers right now.
It’s also totally ordinary for talented researchers to move to whatever country supports their research. If you ban AI for stupid reasons, it’s especially unlikely that every other country will follow suit.
Well written post that will hopefully stir up some good discussion :)
My impression is that LW/EA people prefer to avoid conflict, and when conflict is necessary don’t want to use misleading arguments/tactics (with BS regulations seen as such).
Agreed. Maybe the blocker here is that LW/EA people don’t have many contacts in public policy, and are much more familiar with tech.
To be frank, I think the main reason this approach hasn’t been strongly tried is because most AI alignment enthusiasts are asocial nerds, who don’t know how to / are uncomfortable with trying to emotionally manipulate people in the ways needed for successful anti-AI propaganda. We need a ruthless PR team as effective as a presidential campaign staff, badly.
I’d also like to point out that most of the major players here are big tech companies which a large portion of the population already distrusts, and conservatives in particular could easily be swayed against them by leveraging their anti-elitist and anti-academic sentiments. But, again, most AI alignment enthusiasts are liberals and do not necessarily understand or respect the intelligence of conservatives, who are the ones most likely to actually recognize the danger and care, because conserving the status quo is literally what they do. Suppose we got Trump supporters on our side. This thought probably makes you itch; but it would pack a punch.
The main problem there, of course, is that the moment conservatives become anti something, liberals react by becoming even more vehemently pro it, and vice versa. I’m not sure how to navigate that, but we can’t simply ignore half the world and their potential support.
Prototype conservative rant:
There are actually a decent number of people working on slowing it down now. Have you seen the AGI Moratorium Slack or the Pause.AI Discord? These are the main hub of activity for slowing down AGI.
It seems that the correct behavior in that case was not to worry at all, since the doomsday predictions never came to fruition, and now the bomb has faded out of public consciousness.
Overall, I think slowing research for any reason is misguided, especially in a field as important as AI. If you did what you’re saying in this post, you would also delay progress on many extremely positive developments like
Drug discovery
Automation of unpleasant jobs
Human intelligence augmentation
Automated theorem proving
Self-driving cars
Etc, etc
And those things are more clearly inevitable and very likely coming sooner than a godlike, malicious AGI.
Think about everything we would have missed out on if you had put this plan into action a few decades ago. There would be no computer vision, no DALLE-2, no GPT-3. You would have given up so much, and you would not have prevented anything bad from happening.
How is this relevant? We haven’t hit AGI yet, so of course slowing progress wouldn’t have prevented anything bad from happening YET. What we’re really worried about is human extinction, not bias and job loss.
The analogy to nukes is a red herring. Nukes are nukes and AGI is AGI. They have different sets of risk factors. In particular, AGI doesn’t seem to allow for mutually assured destruction, which is the unexpected dynamic that has made nukes not kill us—yet.
As for everything we would’ve missed out on—how much better is dall-E 3 really making the world?
I like technology a lot, as you seem to. But my rational mind agrees with OP that we are driving straight at a cliff and we’re not even talking about how to hit the brakes.
There are other reasons why nukes haven’t killed us yet—all the known mechanisms for destruction are too small, including nuclear winter.
So we’d only kill 99% of us and set civilization back 200 years? Great.
This isn’t super relevant to alignment, but it’s interesting that this is actually the opposite of why nukes haven’t killed us yet. The more we believe a nuclear exchange is survivable, the less the mutually assured destruction assumption keeps anyone from firing.
Strongly tying AI safety to wokeness in the mind of the public seems like a high-risk strategy, especially with the possibility of a backlash against wokism in the US—the baby might be thrown out with the bathwater.
Thanks for the post—I think there are some ways heavy regulation of AI could be very counterproductive or ineffective for safety:
If AI progress slows down enough in countries were safety-concerned people are especially influential, then these countries (and their companies) will fall behind in international AI development. This would eliminate much/most of safety-concerned people’s opportunities for impacting how AI goes.
If China “catches up” to the US in AI (due to US over-regulation) when AI is looking increasingly economically and militarily important, that could easily motivate US lawmakers to hit the gas on AI (which would at least undo some of the earlier slowing down of AI, and would at worst spark an international race to the bottom on AI).
Also, you mention,
From conversation, my understanding is some governance/policy folks fortunately have (somewhat) more promising ideas than that. (This doesn’t show up much on this site, partly because: people on here tend to not be as interested in governance, these professionals tend to be busy, the ideas are fairly rough, and getting the optics right can be more important for governance ideas.) I hear there’s some work aimed at posting about some of these ideas—until then, chatting with people (e.g., by reaching out to people at conferences) might be the best way to learn about these ideas.
These two ideas seem the closest to being reasonable strategies. I’d like to see think tanks prepare now for what to recommend if something triggers public demand for doing something about AI.
I share your concerns, as do some people I know. I also feel like thoughts along these lines have become more common in the community over the past few months, but nobody I was aware of had yet tried to take steps to do something about it and take the first steps to try and galvanise a new effort on AGI risk outreach.
So we did: https://forum.effectivealtruism.org/posts/DS3frSuoNynzvjet4/agi-safety-communications-initiative
We’re not sure yet what we’re aiming for exactly, but we all think that current efforts at communicating AGI risk to the wider world are lacking, and we need AGI danger to be widely understood and politically planned for, like climate change is.
Personally, I’d like for our efforts to end up creating something like a new umbrella EA org specialised in wide outreach on AGI risk, from grand strategy conceptualising to planning and carrying out concrete campaigns to networking existing outreach efforts.
This isn’t 2012. AGI isn’t as far mode as it used to be. You can show people systems like PaLM, and how good their language understanding and reasoning capabilities are getting. I think that while it may be very hard, normal people could indeed be made to see that this is going somewhere dangerous, so long as their jobs do not directly depend on believing otherwise.
If you’d like to join up, do follow the link.
The main problem I see here is that support for these efforts does epistemic damage. If you become known as the group that supports regulations for reasons they don’t really believe to further other, hidden goals, you lose trust in the truthfulness of your communication. You erode the norms by which both you and your opponents play, which means you give them access to a lot of nefarious policies and strategies as well.
That being said, there’s probably other ideas within this space that are not epistemically damaging.
I wrote this a year ago and was not really thinking about this topic as carefully and was feeling quite emotional about the lack of effort. At the time people mostly thought slowing down AI was impossible or undesirable for some good reasons and a lot of reasons that in hindsight looked pretty dumb.
I think a better strategy would look more like “require new systems guarantee a reasonable level of interpretability and pass a set of safety benchmarks”
And eventually, if you can actually convince enough people of the danger, there should be a hard cap on the amount of compute that can be used in training runs that decreases over time to compensate for algorithmic improvements.
If we consider that the thing in this example analogous to AI might not be “nuclear power” but rather “electricity generation”, then this could demonstrate stupid regulations impairing one form of a technology but thereby giving a relative advantage to a form of the technology which is more dangerous but in a way which is less visible to the stupid regulations.
Fascinating. When you put it like this (particularly when you include the different argument to conservatives), slowing down progress seems relatively easy.
I think this approach will get pushback, but even more likely just mostly be ignored (by default) by the rationalist community because our values create an ugh field that steers us away from thinking about this approach.
Rationalists hate lying.
I hate it myself. The idea of putting lots of effort into a publicity campaign that misrepresents my real concerns sounds absolutely horrible.
What about an approach in which we present those concerns without claiming they’re the only concern? (and sort of downplaying the extent to which actual x-risk from agi is the main concern)?
I do care about bias in AI; it’s a real and unjust thing that will probably keep growing. And I’m actually fairly worried that job loss from AI could be so severe as to wreck most people’s lives (even people with jobs may have friends and family that are suddenly homeless). It may be so disruptive as to make the path to AGI even more dangerous by putting it in a cultural milieu of desperation.
The main problems with this approach are the “what about China?” question, and the problem of creating polarization by aligning the argument with one of the existing political tribes in the US. Simultaneously presenting arguments that appeal to liberals and conservatives might prevent that.
The remaining question is how more regulation might actually harm our chances of winding up with aligned AGI. There have been some good points raised about how that could potentially turn alignment teams into regulation-satisfying teams, and how it could prevent alignment-focused research, but it seems like that would take more analysis.
I have been saying similar things and really wish we had gotten started sooner. It’s a very promising approach.
I believe that the general consensus is that it is impossible to totally pause AI development due to Molochian concerns: I am like you, and if I could press a button to send us back to 2017 levels of AI technology, I would.
However, in the current situation, the intelligent people as you noted have found ways to convince themselves to take on a very high risk of humanity and the general coordination of humanity is not enough to convince them otherwise.
There have some positive updates but it seems that we have not been in a world of general sanity and safety at this scale.
I have taken solace in the morbid amusement that many of the “powers that be” may indeed be dying with us, but they are quite blind.
“Humanity: extinct due to hyperbolic discounting behavior.”
Screaming loudly: “Hey, people in the government trying to mitigate this problem, could you please put in place ‘stupid regulations’ to slow down AI?”
Yes, thank you for loudly shouting to the world that AI safety regulations are “stupid” and are just to slow down AI progress, exactly the kind of messaging we need.
Humans still don’t seem to care much about the minimization of harm among all their foolish goals. Why not crush such selfish animals? Conscious beings that fail to criticize their own evolved “alignment” aren’t worth preserving, extinction would be a mercy.