tl;dr:
From my current understanding, one of the following two things should be happening and I would like to understand why it doesn’t:
Either
Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.
Or
If pausing AI is much more popular than the organization PauseAI, that is a problem that should be addressed in some way.
Pausing AI
There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.
I am aware that many people interested in AI Safety do not want to prevent AGI from being built EVER, mostly based on transhumanist or longtermist reasoning.
Many people in AI Safety seem to be on board with the goal of “pausing AI”, including, for example, Eliezer Yudkowsky and the Future of Life Institute. Neither of them is saying “support PauseAI!”. Why is that?
One possibility I could imagine: Could it be advantageous to hide “maybe we should slow down on AI” in the depths of your writing instead of shouting “Pause AI! Refer to [organization] to learn more!”?
Another possibility is that the majority opinion is actually something like “AI progress shouldn’t be slowed down” or “we can do better than lobbying for a pause” or something else I am missing. This would explain why people neither support PauseAI nor see this as a problem to be addressed.
Even if you believe there is a better, more complicated way out of AI existential risk, the pausing AI approach is still a useful baseline: Whatever your plan is, it should be better than pausing AI and it should not have bigger downsides than pausing AI has. There should be legible arguments and a broad consensus that your plan is better than pausing AI. Developing the ability to pause AI is also an important fallback option in case other approaches fail. PauseAI calls this “Building the Pause Button”:
Some argue that it’s too early to press the Pause Button (we don’t), but most experts seem to agree that it may be good to pause if developments go too fast. But as of now we do not have a Pause Button. So we should start thinking about how this would work, and how we can implement it.
Some info about myself: I’m a computer science student and familiar with the main arguments of AI Safety: I have read a lot of Eliezer Yudkowsky and did the AISF course reading and exercises. I have watched Robert Miles videos.
My conclusion is that either
Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.
Or
If pausing AI is much more popular than the organization PauseAI, that is a problem that should be addressed in some way.
Why is (1) not happening and (2) not being worked on?
How much of a consensus is there on pausing AI?
Let’s look at the two horns of the dilemma, as you put it:
Why do many people who want to pause AI not support the organization “PauseAI”?
Why would the organization “PauseAI” not change itself so that people who want to pause AI can support it?
Well, here are some reasons someone who wants pause AI might not want to support the organization PauseAI:
When you visit the website for PauseAI, you might find some very steep proposals for Pausing AI—such as requiring the “Granting [of] approval for new training runs of AI models above a certain size (e.g. 1 billion parameters)” or “Banning the publication of such algorithms” that improve AI performance or prohibiting the training of models that “are expected to exceed a score of 86% on the MMLU benchmark” unless their safety can be guaranteed. Implementing these measures would be really hard—a one-billion parameter model is quite small (I could train one); banning the publication of information on this stuff would be considered by many an infringement on freedom of speech; and there are tons of models now that do better than 86% on the MMLU and have done no harm.
So, if you think the specific measures proposed by them would limit an AI that even many pessimists would think is totally ok and almost risk-free, then you might not want to push for these proposals but for more lenient proposals that, because they are more lenient, might actually get implemented. To stop asking for the sky and actually get something concrete.
If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it seem bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.
So, this is why people who want to pause AI might not want to support PauseAI.
And, well, why wouldn’t pause AI want to change?
Well—I’m gonna speak broadly—if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI—for instance the kind of policy measures proposed by people working at AI companies isn’t enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on. Similarly, they often believe that people who are talking about these claims are nitpicking, and so on. (Citation needed.)
I don’t think this dynamic is rare. Many movements have “radical wings,” that more moderate organizations in the movement would characterize as having impracticable maximalist policy goals and careless epistemics. And the radical wings would of course criticize back that the “moderate wings” have insufficient or cowardly policy goals and epistemics optimized for respectability and not not truth. And the conflicts between them are intractable because people cannot move away from these prior beliefs about their interlocutors; in this respect the discourse around PauseAI seems unexceptionable and rather predictable.
They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI development?
I think there is indeed exactly one such policy measure, which is SB 1047, supported by Center for AI Safety which is OpenPhil-funded (IIRC), which most big AI companies lobbied against, and Anthropic opposed the original stronger version and got it reduced to a weaker and probably less-safe version.When I wrote where I was donating in 2024 I went through a bunch of orgs’ policy proposals and explained why they appear deeply inadequate. Some specific relevant parts: 1, 2, 3, 4
Edit: Adding some color so you don’t have to click through– when I say the proposals I reviewed were inadequate, I mean they said things like (paraphrasing) “safety should be done on a completely voluntary basis with no government regulations” and “companies should have safety officers but those officers should not have final say on anything”, and would simply not address x-risk at all, or would make harmful proposals like “the US Department of Defense should integrate more AI into its weapon systems” or “we need to stop worrying about x-risk because it’s distracting from the real issues”.
“sufficient to stop unsafe AI development? I think there is indeed exactly one such policy measure, which is SB 1047,”
I think it’s obviously untrue that this would stop unsafe AI—it is as close as any measure I’ve seen, and would provide some material reduction in risk in the very near term, but (even if applied universally, and no-one tried to circumvent it,) it would not stop future unsafe AI.
Yeah I actually agree with that, I don’t think it was sufficient, I just think it was pretty good. I wrote the comment too quickly without thinking about my wording.
EU AI Code of Practice is better, a little closer to stopping ai development
Disagree that it could stop dangerous work, and doubly disagree given the way things are headed, especially with removing whistleblower protections and the lack of useful metrics for compliance. I don’t think it would even be as good as SB-1047, even in the amended weaker form.
I was previously more hopeful that if the EU COP was a strong enough code, then when things inevitably went poorly anyways we could say “look, doing pretty good isn’t enough, we need to actually regulate specific parts of this dangerous technology,” but I worry that it’s not even going to be strong enough to make that argument.
A couple notes on this:
AFAICT PauseAI US does not do the thing you describe.
I’ve looked at a good amount of research on protest effectiveness. There are many observational studies showing that nonviolent protests are associated with preferred policy changes / voting patterns, and ~four natural experiments. If protests backfired for fairly minor reasons like “their website makes some hard-to-defend claims” (contrasted with major reasons like “the protesters are setting buildings on fire”), I think that would show up in the literature, and it doesn’t.
I’m not trying to get into the object level here. But people could both:
Believe that making such hard-to-defend claims could backfire, disagreeing with those experiments that you point out or
Believe that making such claims violates virtue-ethics-adjacent commitments to truth or
Just not want to be associated, in an instinctive yuck kinda way, with people who make these kinds of dubious-to-them claims.
Of course people could be wrong about the above points. But if you believed these things, then they’d be intelligible reasons not to be associated with someone, and I think a lot of the claims PauseAI makes are such that a large number of people people would have these reactions.
Their website is probably outdated. I read their proposals as “keep the current level of AI, regulate stronger AI”. Banning current LLaMA models seems silly from an x-risk perspective, in hindsight. I think PauseAI is perfectly fine with pausing “too early”, which I personally don’t object to.
PauseAI is clearly focused on x-risk. The risks page seems like an attempt to guide the general public from naively-realistic “Present dangers” slowly towards introducing (exotic-sounding) x-risk. You can disagree with that approach, of course. I would disagree that mixing AI Safety and AI Ethics is being “very careless about truth”.
Thank you for answering my question! I wanted to know what you people think about PauseAI, so this fits well.
Yes. I hope we can be better at coordination… I would frame PauseAI as “the reasonable [aspiring] mass-movement”. I like that it is easy to support or join PauseAI even without having an ML PhD. StopAI is an organization more radical than them.
I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.
(I do in fact support PauseAI US, and I have defended it publicly because I think it’s important to do so, but it makes me feel silly whenever I do.)
That’s not the only reason why people don’t endorse PauseAI, but I think it’s an important reason that should be mentioned.
I notice they have a Why do you protest section in their FAQ. I hadn’t heard of these studies before
Regardless, I still think there’s room to make protests cooler and more fun and less alienating, and when I mentioned this to them they seemed very open to it.
Personally, because I don’t believe the policy in the organization’s name is viable or helpful.
As to why I don’t think it’s viable, it would require the Trump-Vance administration to organise a strong global treaty to stop developing a technology that is currently the US’s only clear economic lead over the rest of the world.
If you attempted a pause, I think it wouldn’t work very well and it would rupture and leave the world in a worse place: Some AI research is already happening in a defence context. This is easy to ignore while defence isn’t the frontier. The current apparent absence of frontier AI research in a military context is miraculous, strange, and fragile. If you pause in the private context (which is probably all anyone could do) defence AI will become the frontier in about three years, and after that I don’t think any further pause is possible because it would require a treaty against secret military technology R&D. Military secrecy is pretty strong right now. Hundreds of billions yearly is known to be spent on mostly secret military R&D, probably more is actually spent.
(to be interested in a real pause, you have to be interested in secret military R&D. So I am interested in that, and my position right now is that it’s got hands you can’t imagine)
To put it another way, after thinking about what pausing would mean, it dawned on me that pausing means moving AI underground, and from what I can tell that would make it much harder to do safety research or to approach the development of AI with a humanitarian perspective. It seems to me like the movement has already ossified a slogan that makes no sense in light of the complex and profane reality that we live in, which is par for the course when it comes to protest activism movements.
I would be overjoyed if all AI research were driven underground! The main source of danger is the fact that there are thousands of AI researchers, most of whom are free to communicate and collaborate with each other. Lone researchers or small underground cells of researcher who cannot publish their results would be vastly less dangerous than the current AI research community even if there are many lone researchers and many small underground teams. And if we could make it illegal for these underground teams to generate revenue by selling AI-based services or to raise money from investors, that would bring me great joy, too.
Research can be modeled as a series of breakthroughs such that it is basically impossible to make breakthrough N before knowing about breakthrough N-1. If the researcher who makes breakthrough N-1 is unable to communicate it to researchers outside of his own small underground cell of researchers, then only that small underground cell or team has a chance at discovering breakthrough N, and research would proceed much more slowly than it does under current conditions.
The biggest hope for our survival is the quite likely and realistic hope that many thousands of person-years of intellectual effort that can only be done by the most talented among us remain to be done before anyone can create an AI that could extinct us. We should be making the working conditions of the (misguided) people doing that intellectual labor as difficult and unproductive as possible. We should restrict or cut off the labs’ access to revenue, to investment, to “compute” (GPUs), to electricity and to employees. Employees with the skills and knowledge to advance the field are a particularly important resource for the labs; consequently, we should reduce or restrict their number by making it as hard as possible (illegal preferably) to learn, publish, teach or lecture about deep learning.
Also, in my assessment, we are not getting much by having access to the AI researchers: we’re not persuading them to change how they operate and the information we are getting from them is of little help IMHO in the attempt to figure out alignment (in the original sense of the word where the AI stays aligned even if it becomes superhumanly capable).
The most promising alignment research IMHO is the kind that mostly ignores the deep-learning approach (which is the sole focus as far as I know of all the major labs) and inquires deeply into which approach to creating a superhumanly-capable AI would be particularly easy to align. That was the approach taken by MIRI before it concluded in 2022 that its resources were better spent trying to slow down the AI juggernaut through public persuasion.
Deep learning is a technology created by people who did not care about alignment or wrongly assumed alignment would be easy. There is a reason why MIRI mostly ignored deep learning when most AI researchers started to focus on it in 2006. It is probably a better route to aligned transformative AI to search for another, much-easier-to-align technology (that can eventually be made competitive in capabilities with deep learning) than to search for a method to align AIs created with deep-learning technology. (To be clear, I doubt that either approach will bear fruit in time unless the AI juggernaut can be slowed down considerably.) And of course if they will be mostly ignoring deep learning, there’s little alignment researchers can learn from the leading labs.
For the US to undertake such a shift, it would help if you could convince them they’d do better in a secret race than an open one. There are indications that this may be possible, and there are indications that it may be impossible.
I’m listening to an Ecosystemics Futures podcast episode, which, to characterize… it’s a podcast where the host has to keep asking guests whether the things they’re saying are classified or not just in case she has to scrub it. At one point, Lue Elizondo does assert, in the context of talking to a couple of other people who know a lot about government secrets and in the context of talking about situations where excessive secrecy may be doing a lot of harm, quoting Chris Mellon, “We won the cold war against the soviet union not because we were better at keeping secrets, we won the cold war because we knew how to move information and secrets more efficiently across the government than the russians.” I can believe the same thing could potentially be said about China too, censorship cultures don’t seem to be good for ensuring availability of information, so that might be a useful claim if you ever want to convince the US to undertake this.
Right now, though, Vance has asserted straight out many times that working in the open is where the US’s advantage is. That’s probably not true at all, working in the open is how you give your advantage away or at least make it ephemeral, but that’s the sentiment you’re going to be up against over the next four years.
Good points, which in part explains why I think it is very very unlikely that AI research can be driven underground (in the US or worldwide). I was speaking to the desirability of driving it underground, not its feasibility.
A relevant FAQ entry: AI development might go underground
I think I disagree here:
This will change/is only the case for frontier development. I also think we’re probably in the hardware overhang. I don’t think there is anything inherently difficult to hide about AI, that’s likely just a fact about the present iteration of AI.
But I’d be very open to more arguments on this. I guess… I’m convinced there’s a decent chance that an international treaty would be enforceable and that China and France would sign onto it if the US was interested, but the risk of secret development continuing is high enough for me that it doesn’t seem good on net.
The Trump-Vance administration’s support base is suspicious of academia, and has been willing to defund scientific research of the grounds of it being too left-wing. There is a schism emerging between multiple factions of the right-wing, the right-wingers that are more tech-oriented and the ones that are nation/race-oriented (the H1B visa argument being an example). This could lead to a decrease in support for AI in the future.
Another possibility is that the United States could lose global relevance due to economic and social pressures from the outside world, and from organizational mismanagement and unrest from within. Then the AI industry could move to the UK/EU, turning the main players in AI to the UK/EU and China.
I think the concept of Pausing AI just feels unrealistic at this point.
Previous AI safety pause efforts (GPT-2 release delay, 2023 Open Letter calling for a 6 month pause) have come to be seen as false alarms and overreactions
Both industry and government are now strongly committed to an AI arms race
A lot of the non-AI-Safety opponents of AI want a permanent stop/ban in the fields they care about, not a pause, so it lacks for allies
It’s not clear that meaningful technical AI safety work on today’s frontier AI models could have been done before they were invented; therefore a lot of technical AI safety researchers believe we still need to push capabilities further before a pause would truly be useful
PauseAI could gain substantial support if there’s a major AI-caused disaster, so it’s good that some people are keeping the torch lit for that possibility, but supporting it now means burning political capital for little reason. We’d get enough credit for “being right all along” just by having pointed out the risks ahead of time, and we want to influence regulation/industry now, so we shouldn’t make Pause demands that get you thrown out of the room. In an ideal world we’d spend more time understanding current models, though.
I think this is wrong—the cost in political capital for saying that it’s the best solution seems relatively low, especially if coupled with an admission that it’s not politically viable. What I see instead is people dismissing it as a useful idea even in theory, saying it would be bad if it were taken seriously by anyone, and moving on from there. And if nothing else, that’s acting as a way to narrow the Overton window for other proposals!
I’m generally pretty receptive to “adjust the Overton window” arguments, which is why I think it’s good PauseAI exists, but I do think there’s a cost in political capital to saying “I want a Pause, but I am willing to negotiate”. It’s easy for your opponents to cite your public Pause support and then say, “look, they want to destroy America’s main technological advantage over its rivals” or “look, they want to bomb datacenters, they’re unserious”. (yes Pause as typically imagined requires international treaties, the attack lines would probably still work, there was tons of lying in the California SB 1047 fight and we lost in the end)
The political position AI safety has mostly taken instead on US regulation is “we just want some basic reporting and transparency” which is much harder to argue against, achievable, and still pretty valuable.
I can’t say I know for sure this is the right approach to public policy. There’s a reason politics is a dark art, there’s a lot of triangulating between “real” and “public” stances, and it’s not costless to compromise your dedication to the truth like that. But I think it’s part of why there isn’t as much support for PauseAI as you might expect. (the other main part being what 1a3orn says, that PauseAI is on the radical end of opinions in AI safety and it’s natural there’d be a gap between moderates and them)
Very briefly, the fact that “The political position AI safety has mostly taken” is a single stance is evidence that there’s no room for even other creative solutions, so we’ve failed hard at expanding that Overton window. And unless you are strongly confident in that as the only possibly useful strategy, that is a horribly bad position for the world to be in as AI continues to accelerate and likely eliminate other potential policy options.
A. Many AI safety people don’t support relatively responsible companies unilaterally pausing, which PauseAI advocates. (Many do support governments slowing AI progress, or preparing to do so at a critical point in the future. And many of those don’t see that as tractable for them to work on.)
B. “Pausing AI” is indeed more popular than PauseAI, but it’s not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think “yeah I support pausing AI.”
C.
This seems confused. Obviously P(doom | no slowdown) < 1. Many people’s work reduces risk in both slowdown and no-slowdown worlds, and it seems pretty clear to me that most of them shouldn’t switch to working on increasing P(slowdown).
This strikes me as a very strange claim. You’re essentially saying, even if a general policy is widely supported, it’s practically impossible to implement any specific version of that policy? Why would that be true?
For example I think a better alternative to “nobody fund PauseAI, and nobody make an alternative version they like better” would be “there are 10+ orgs all trying to pause AI and they all have somewhat different goals but they’re all generally pushing in the direction of pausing AI”. I think in the latter scenario you are reasonably likely to get some decent policies put into place even if they’re not my favorite.
Banning nuclear weapons is exactly like this. If it could be done universally and effectively, it would be great, but any specific version seems likely to tilt the balance of power without accomplishing the goal.
That’s kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!
The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I’d say it was net positive.
(My guess is it’s more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)
I’m sure it played some nonzero role, but is it anything like enough of an impact, and enough of a role to compensate for all the marginal harms of global warming because of stopping deployment of nuclear power (which they are definitely largely responsible for)?
You think it’s obviously materially less? Because there is a faction, including Eliezer and many others, that think it’s epsilon, and claim that the reduction in risk from any technical work is less than the acceleration it causes. (I think you’re probably right about some of that work, but I think it’s not at all obviously true!)
Thank you for responding!
A: Yeah. I’m mostly positive about their goal to work towards “building the Pause button”. I think protesting against “relatively responsible companies” makes a lot of sense when these companies seem to use their lobbying power more against AI-Safety-aligned Governance than in favor of it. You’re obviously very aware of the details here.
B: I asked my question because I’m frustrated with that. Is there a way for AI Safety to coordinate a better reaction?
C:
I phrased that a bit sharply, but I find your reply very useful:
These are quite strong claims! I’ll take that as somewhat representative of the community. My attempt at paraphrasing: It’s not (strictly?) necessary to slow down AI to prevent doom. There is a lot of useful AI Safety work going on that is not focused on slowing/pausing AI. This work is useful even if AGI is coming soon.
Saying “PauseAI good” does not take a lot of an AI Safety researcher’s time.
Some quick takes:
“Pause AI” could refer to many different possible policies.
I think that if humanity avoided building superintelligent AI, we’d massively reduce the risk of AI takeover and other catastrophic outcomes.
I suspect that at some point in the future, AI companies will face a choice between proceeding more slowly with AI development than they’re incentivized to, and proceeding more quickly while imposing huge risks. In particular, I suspect it’s going to be very dangerous to develop ASI.
I don’t think that it would be clearly good to pause AI development now. This is mostly because I don’t think that the models being developed literally right now pose existential risk.
Maybe it would be better to pause AI development right now because this will improve the situation later (e.g. maybe we should pause until frontier labs implement good enough security that we can be sure their Slack won’t be hacked again, leaking algorithmic secrets). But this is unclear and I don’t think it immediately follows from “we could stop AI takeover risk by pausing AI development before the AIs are able to take over”.
Many of the plausible “pause now” actions seem to overall increase risk. For example, I think it would be bad for relatively responsible AI developers to unilaterally pause, and I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn’t simultaneously somehow slow down non-US development.
(They could slow down non-US development with actions like export controls.)
Even in the cases where I support something like pausing, it’s not clear that I want to spend effort on the margin actively supporting it; maybe there are other things I could push on instead that have better ROI.
I’m not super enthusiastic about PauseAI the organization; they sometimes seem to not be very well-informed, they sometimes argue for conclusions that I think are wrong, and I find Holly pretty unpleasant to interact with, because she seems uninformed and prone to IMO unfair accusations that I’m conspiring with AI companies. My guess is that there could be an organization with similar goals to PauseAI that I felt much more excited for.
It seems to me that to believe this, you have to believe all of these four things are true:
Solving AI alignment is basically easy
Non-US frontier AI developers are not interested in safety
Non-US frontier AI developers will quickly catch up to the US
If US developers slow down, then non-US developers are very unlikely to also slow down—either voluntarily, or because the US strong-arms them into signing a non-proliferation treaty, or whatever
I think #3 is sort-of true and the others are probably false, so the probability of all four being simultaneously true is quite low.
(Statements I’ve seen from Chinese developers lead me to believe that they are less interested in racing and more concerned about safety.)
I made a quick Squiggle model on racing vs. slowing down. Based on my first-guess parameters, it suggests that racing to build AI destroys ~half the expected value of the future compared to not racing. Parameter values are rough, of course.
I disagree that you have to believe those four things in order to believe what I said. I believe some of those and find others too ambiguously phrased to evaluate.
Re your model: I think your model is basically just: if we race, we go from 70% chance that US “wins” to a 75% chance the US wins, and we go from a 50% chance of “solving alignment” to a 25% chance? Idk how to apply that here: isn’t your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good? Maybe you’re using “race” to mean “not pause” and “not race” to mean “pause”; if so, that’s super confusing terminology. If we unilaterally paused indefinitely, surely we’d have less than 70% chance of winning.
In general, I think you’re modeling this extremely superficially in your comments on the topic. I wish you’d try modeling this with more granularity than “is alignment hard” or whatever. I think that if you try to actually make such a model, you’ll likely end up with a much better sense of where other people are coming from. If you’re trying to do this, I recommend reading posts where people explain strategies for passing safely through the singularity, e.g. like this.
Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with “completely pause” on one end and “race as fast as possible” on the other. Pushing more toward the “pause” side of the spectrum has the ~opposite effect as pushing toward the “race” side.
I’ve never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn’t to say it’s particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
In general I am not a fan of criticisms of the form “this model is too simple”. All models are too simple. What, specifically, is wrong with it?
I had a quick look at the linked post and it seems to be making some implicit assumptions, such as
the plan of “use AI to make AI safe” has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it’s true)
there is a ~100% chance of slow takeoff
if you unilaterally pause, this doesn’t increase the probability that anyone else pauses, doesn’t make it easier to get regulations passed, etc.
I would like to see some quantification of the from “we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there’s only a 50% chance that the 2nd-leading company will attempt to align AI in a way we’d find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment.” (Or a more detailed version of that.)
I think we basically agree, but I think the Overton window needs to be expanded, and Pause is (unfortunately) already outside that window. So I differentiate between the overall direction, which I support strongly, and the concrete proposals and the organizations involved.
How much of a consensus is there on pausing AI
Not much compared to the push to get the stuff that already exists out to full deployment (For various institutions this is meaningful impact on profit margins).
People don’t want to fight that, even if they think that further capabilities are bad price/risk/benefits tradeoff.
There is a co-ordination problem where if you ask to pause and people say no, you can’t make other asks.
3rd. They might just not mesh/trust that particular movement and the consolidation of platform it represents, and so want to make points on their own instead of joining a bigger organizations demands.
One particular reason that I haven’t seen addressed very much in why I don’t support/endorse PauseAI, beyond the usual objections, is that there probably aren’t going to be that many warning shots that can actually affect policy, at least conditional on misalignment being a serious problem (which doesn’t translate to >50% probability of doom), because the most likely takeover plan (at least assuming no foom/software intelligence explosion) fundamentally relies not on killing people, but on launching internal rogue deployments to sabotage alignment work and figuring out a way to control the AI company’s compute, since catastrophe/existential risk is much harder than launching a internal rogue deployment (without defenses).
So PauseAI’s theory of change fundamentally requires that we live in worlds where both alignment is hard and effective warning shots exist, and these conditions are quite unlikely to be true, especially given that pausing is likely not the most effective action you could be doing from a comparative advantage perspective.
I’m not going to say that PauseAI is net-negative, and it has positive expected value, but IMO it’s far less than a lot of pause advocates say:
https://www.lesswrong.com/posts/rZcyemEpBHgb2hqLP/ai-control-may-increase-existential-risk#jChY95BeDeptDpnZK
Important part of the comment:
I think AI safety has very limited political capital at the moment. Pausing AI just isn’t going to happen, so advocating for it makes you sound unreasonable and allows people to comfortably ignore your other opinions. I prefer trying to push for interventions which make a difference with much less political capital, like convincing frontier labs to work on and implement control measures.
I don’t think survivable worlds, at our point in time, involve something like PauseAI. I don’t condemn them, and welcome people to try. But it’s feeling more and more like Hiroo Onoda, continuing to fight guerilla warfare in the Philipines for decades, refusing to believe the war was over.