AI pause/​governance advocacy might be net-negative, especially without a focus on explaining x-risk

I found myself repeating the same words to multiple people, hence a short post.

I think some of the AI pause/​governance advocacy might be net-negative. Three reasons:

  • Most importantly, it’s easy to get regulation implemented for reasons different from addressing x-risk, which leads to all sorts of failure modes, where it becomes actually harder to prevent x-risk with further regulation, and we all simply die a bit later[1];

  • Less importantly, when talking about a dangerous technology, it’s easy to incentivise governments to race to invest in that technology instead of preventing everyone from getting it;

  • To keep in mind, when talking about non-x-risk concerns that can help the pause, you might be outside your area of expertise and say something that any technical expert would say is wrong and consider you not to know what you’re talking about.

Edit: I have somewhat updated my views around that since writing my post; in particular, see this comment.

Edit 2: I further changed my mind; I think this post and the comment still communicate something useful, but don’t fully represent my current views.

Epistemic status: idk, handwavy models, a bunch of relevant experience; some people who disagreed with me have changed their mind when I talked about these points and they haven’t made good points in response; I’ve seen docs that would’ve been harmful if important people saw them; the authors agreed with some of my object-level objections and changed the texts. Seems good to put out there.

If AI regulation isn’t explicitly aimed at x-risk, it can be net-negative

What I think:

It’s pretty important to remember what the aim is. It’s not to slow down AI but to prevent an existential catastrophe. “Slowing AI” might help somewhat, but it’s not enough, and some kinds of “slowing down AI” can make it much harder to get policymakers to also introduce regulation that prevents x-risk.

Some strategies involve advocating for/​introducing AI regulations without mentioning x-risk, with the hope of locally slowing down AI progress, building frameworks that can later be used to address x-risk, or fostering relationships with policymakers. Many of them carry significant downside risks and are net-negative.

Many people don’t seem to consider politicians and voters to be grown-ups who can listen to arguments for why AI poses an x-risk, and implement AI regulation that slows down AI for the reasons we (and everyone who thought about the problem) want AI to be slowed down. These people propose regulations that they think can help with x-risk but don’t present the regulations as motivated by x-risk. Aside from this being dishonest (don’t be), it can backfire badly. The proposed regulations helping with other problems as well can be a nice bonus, but if addressing other problems is the only aim the policymakers have, you can end up with AI systems that are safe and ethical until they’re smart enough to kill you.

Instead, you can explain the actual problem! Not necessarily your full thinking: obviously, it makes sense to simplify a lot. But the audience are not children; they’re smart, and they can understand what you’re talking about. And it’s possible to reach them and get them to listen because you have a comparative advantage to every other problem that demands their time: many experts agree that yours is going to kill everyone, soon, unless something highly unusual is done; when it works, it produces a huge incentive for them to try to address this problem, and maybe find experts in AI regulation with proposals that can address the x-risk at hand.

It might be more important to carefully explain why x-risk is real than to propose specific regulation that can, as we know, help with x-risk (especially if we’re not locked in a specific form and can get the policymakers to adjust it).

Why:

My guess is that historically, either the politicians trying to prevent technological progress have lost, or their countries have lost.

By default, that probably makes the relevant governments more interested in addressing concerns about technology’s impact than in preventing technology from developing. We need regulation that prevents anyone from developing ASI without solving ASI alignment, but governments are extremely unlikely to implement policies like that, unless they are presented with arguments that couldn’t be made about civilian tech that existed in the past; attempts to implement “helpful” regulation might be actively harmful.

Regulation designed to address shorter-term problems might delay AI development and make it less attractive to invest in general AI, but there will still be a huge financial incentive to train and utilise more general AI systems, and this sort of regulation won’t prevent x-risk. There are proposals along the lines of making AI companies responsible for the harm their models can cause, prohibiting public access to models that can be jailbroken or have enough understanding of biology to help develop novel pathogens, etc. All of these are great, but on their own, they don’t prevent more advanced AI from being trained and don’t help at all if/​once the labs can solve these problems.

Imagine you persuade the policymakers to take the “ai+bioweapons=bad”, “jailbreaks=not cool”, or “explainability=a must” position and address these issues with regulation. Things are seemingly slowing down. But actually, OpenAI invests $2B in preventing jailbreaks in GPT-6, Anthropic creates and sells a service to exclude potentially dangerous bio info from the training datasets, etc., they continue scaling with the hope nothing goes off the rails, but their complicated scalable oversight system starts successfully pursuing alien goals, and soon, everyone dies[2].

Betting the future on the hope that it will be impossible to solve jailbreaks seems bad.

If the regulation doesn’t prevent training potentially existentially dangerous systems, and you haven’t added a robust mechanism that would allow you to make the policymakers change the regulation into one prohibiting potentially existentially dangerous training runs, then the regulation just makes everyone die a bit later.

Approximately, to change already implemented or introduced treaties, laws, or bills, you’d have to make policymakers:

  • listen to updates from you, even if things have since moved from the stage/​format where your input was taken into consideration;

  • buy the idea that there are other, actually existential (not job-loss-and-bioterror) dangers as well;

  • listen to you and want to help you despite the surprise that you knew and were worrying about existential risk all along (by the way, your plan is bad if it stops working when people you want to befriend read a NYT article on “these Luddites try to infiltrate political elites to slow down the economically valuable progress because they fear the Terminator”; proposing policies not because you honestly believe they’re good but because you might come off as an expert and gain “political influence”, so later you can raise awareness of the x-risk, seems wrong);

  • persuade everyone who at that point needs to be persuaded to influence and amend the regulation so it addresses x-risk, even though getting everyone on board with amending the regulations can be harder, there can be some institutional lock-in into the existing regulations, with many people already convinced that these regulations need to exist for the stated reasons and address the stated problems (e.g., imagine you got the law-makers on board because they thought it’s fine to get the economic benefits of AI slightly later: after jailbreaks are solved and the models don’t help anyone with bioweapons; after jailbreaks are no longer a problem, you tell them their country still can’t get the economic benefits for completely unrelated reasons; you can easily turn this into something political, and the AI labs won’t be on your side);

  • etc.

Your theory of change needs to have the aim of preventing x-risks. Slowing down AI is a leaky proxy, especially if you lose the impact on x-risk from your sight; please don’t blindly optimise for the proxy.

It seems to be far easier to explain the problem and incentivise regulations that would address it because they’re designed to address it. There are sets of people such that if you persuade everyone in a set and they all know they all worry about the problem, you’ll get the regulation that tries to address the problem. If you do that, you’ll likely be able to send specific proposals for dealing with the problem, direct them at existing AI governance experts who would love to work on something that reduces the x-risk, and people you’ll have persuaded are likely to be able to direct resources at improving the policy proposals, while keeping them directed at addressing x-risk.

If you say that this is dangerous, it might sound attractive instead

Advocates for an AI pause often miss ways they can incentivise policymakers to move in the wrong direction.

The intelligence community will be hearing capabilities-related points ten times louder than danger-related points. We don’t want anyone training large models with dangerous capabilities, even the military that wants to get those before bad actors do.

E.g., advocating for compute governance with the goal of preventing bad actors from gaining access to cutting-edge models that can hack everything, could be okay if done extremely carefully, but we don’t want some of what we might say in support of regulation to incentivise state actors to invest heavily in advancing general AI.

If you don’t have a gears-level understanding, don’t communicate the problems unless you clearly state the epistemic status and point to the experts

One of the groups working on the advocacy had a page stating something along the lines of AI being more likely than not to be used by bad actors to hack literally everything, shut down the internet, and cause a societal collapse. If someone who it’s important to persuade asked a cybersecurity expert about the threat model, the cybersecurity expert would’ve said that even given the assumption of AIs powerful enough to hack being trained, the story wouldn’t go this way, and the authors don’t know what they’re talking about.[3] The page could also have incentivised people to invest in AI cyber offence capabilities. I thought all of that could’ve had a downside risk big enough for spending my time on this to make sense, so I talked to them, and they’ve made some changes.

Many people advocating for AI regulation don’t understand why the problem is, indeed, a problem. They often don’t have a technical background that allows them to explain the x-risk. Technical experts might say that these people don’t understand AI, and they would be right.[4] People from the leading AI labs might be able to predict everything they can say and prepare replies to all their arguments.

Conclusion

This is really important to get right.

People working on advocacy are awesome. They’re smart and caring. But I think some of their strategies carry significant downside risks.

AI Alignment won’t be solved if given a couple more years.

Make actual arguments people believe in, rather than trying to play 4d PR chess.

Honest arguments for slowing down that don’t include the x-risk might be fine if made extremely carefully. Be honest, be careful, do premortems of advocacy and public outreach attempts, and find people who can help you with those premortems.

  1. ^

    Not central, but possibly makes sense to address: some people might have a theory of change that their interventions will cause everyone to die a bit later than they otherwise would. This might seem pretty valuable to some people, especially if they think it’s the best they can do. However, many of these interventions decrease the likelihood of preventing an existential catastrophe. If you care about future generations, and there’s a meaningful chance of stumbling across a way to increase the probability of humanity surviving, then the total potential change in the probability times the sheer number of future people and the value of their lives is probably worth much more than making 8 billion people live for a couple of years longer. If a direction of work makes it harder to prevent the x-risk, maybe don’t pursue it? (And if you’re not a long-termist, maybe it makes more sense to donate to charities that prolong the life the most per dollar within the time limit and not focus on AI.)

  2. ^

    I expect people here to agree the labs shouldn’t be able to train anything too smart until everyone agrees it is safe, so this post isn’t about why this happens by default. But if you’re working on alignment, and think scalable oversight might work by default, in less than 20 years, you might be losing the hard parts of the problem out of sight. I wrote a post on this.

  3. ^

    In particular, the page did not address the fact that (I think; I’m not a cybersecurity expert) people training the AI might try to find and make patches for potential vulnerabilities, and even with widely available models, an increase in AI abilities is likely to lead to increase in the number of bugs good actors can find and patch more than the number of bugs bad actors can find and exploit. It talked about a likely societal collapse caused by bad actors shutting down the internet, but bad actors don’t really have an incentive to shut down the internet. With proprietary codebases, good actors get access to the ability to find and fix bugs earlier than bad actors are able to exploit their systems as black boxes.[5]

  4. ^

    Technically, I don’t think anyone really understands AI, but there are varying degrees.

  5. ^

    I also think that at the point where AI is smart enough to discover important zero-days and write exploits for them much faster than humans, it’s probably already powerful enough to be around the sharp left turn dynamics and the capabilities necessary to escape and then kill everyone, and you have to discover a way to align it before that happens, or it causes the deaths of everyone instead of a societal collapse.