@Steven Byrnes’ intense-world theory of autism seems like the sort of thing you’re looking for.
Thane Ruthenis
I agree that this isn’t an obviously unreasonable assumption to hold. But...
I don’t think the assumption is so likely to hold that one can assume it as part of a safety case for AI
… that.
The idea that all the labs focus on speeding up their own research threads rather than serving LLMs to customers is already pretty dubious. Developing LLMs and using them are two different skillsets; it would make economic sense for different entities to specialize in those things
I can maybe see it. Consider the possibility that the decision to stop providing public access to models past some capability level is convergent: e. g., the level at which they’re extremely useful for cyberwarfare (with jailbreaks still unsolved) such that serving the model would drown the lab in lawsuits/political pressure, or the point at which the task of spinning up an autonomous business competitive with human businesses, or making LLMs cough up novel scientific discoveries, becomes trivial (i. e., such that the skill level required for using AI for commercial success plummets – which would start happening inasmuch as AGI labs are successful in moving LLMs to the “agent” side of the “tool/agent” spectrum).
In those cases, giving public access to SOTA models would stop being the revenue-maximizing thing to do. It’d either damage your business reputation[1], or it’d simply become more cost-effective to hire a bunch of random bright-ish people and get them to spin up LLM-wrapper startups in-house (so that you own 100% stake in them).
Some loose cannons/open-source ideologues like DeepSeek may still provide free public access, but those may be few and far between, and significantly further behind. (And getting progressively scarcer; e. g., the CCP probably won’t let DeepSeek keep doing it.)
Less extremely, AGI labs may move to a KYC-gated model of customer access, such that only sufficiently big, sufficiently wealthy entities are able to get access to SOTA models. Both because those entities won’t do reputation-damaging terrorism, and because they’d be the only ones able to pay the rates (see OpenAI’s maybe-hype maybe-real whispers about $20,000/month models).[2] And maybe some EA/R-adjacent companies would be able to get in on that, but maybe not.
Also,
no lab has a significant moat, and the cutting edge is not kept private for long, and those facts look likely to remain true for a while
This is a bit flawed, I think. I think the situation is that runner-ups aren’t far behind the leaders in wall-clock time. Inasmuch as the progress is gradual, this translates to runner-ups being not-that-far-behind the leaders in capability level. But if AI-2027-style forecasts come true, with the capability progress accelerating, a 90-day gap may become a “GPT-2 vs. GPT-4”-level gap. In which case alignment researchers having privileged access to true-SOTA models becomes important.
(Ideally, we’d have some EA/R-friendly company already getting cozy with e. g. Anthropic so that they can be first-in-line getting access to potential future research-level models so that they’d be able to provide access to those to a diverse portfolio of trusted alignment researchers...)
- ^
Even if the social benefits of public access would’ve strictly outweighed the harms on a sober analysis, the public outcry at the harms may be significant enough to make the idea commercially unviable. Asymmetric justice, etc.
- ^
Indeed, do we know it’s not already happening? I can easily imagine some megacorporations having had privileged access to o3 for months.
- ^
Orrr he’s telling comforting lies to tread the fine line between billion-dollar hype and nationalization-worthy panic.
Could realistically be either, but it’s probably the comforting-lies thing. Whatever the ground-truth reality may be, the AGI labs are not bearish.
Indeed, and maintaining this release schedule is indeed a bit impressive. Though note that “a model called o4 is released” and “the pace of progress from o1 to o3 is maintained” are slightly different. Hopefully the release is combined with a proper report on o4 (not just o4-mini), so we get actual data regarding how well RL-on-CoTs scales.
FWIW, that’s not a crux for me. I can totally see METR’s agency-horizon trend continuing, such that 21 months later, the SOTA model beats METR’s 8-hour tests. What I expect is that this won’t transfer to real-world performance: you wouldn’t be able to plop that model into a software engineer’s chair, prompt it with the information in the engineer’s workstation, and get one workday’s worth of output from it.
At least, not reliably and not in the generel-coding setting. It’s possible this sort of performance would be achieved in some narrow domains, and that this would happen once in a while on any task. (Indeed, I think that’s already the case?) And I do expect nonzero extension of general-purpose real-world agency horizons. But what I expect is slower growth, with the real-world performance increasingly lagging behind the performance on the agency-horizon benchmark.
Fair point. The question of the extent to which those documents can be taken seriously as statements of company policy (as opposed to only mattering in signaling games) is still worthwhile, I think.
I can never tell how seriously to take those types of documents.
On one hand, AGI labs obviously have employees, including senior employees, who genuinely take the risks seriously (most notably, some very well-respected LW users, e. g. some of this specific document’s authors). I’m sure the people writing them are writing them in good faith.
On the other hand, the documents somehow never end up containing recommendations that would be drastically at odds with “race full steam ahead” (see the rather convenient Core Assumption 5 here, and subsequent “just do the standard thing plus amplified oversight” alignment plan) or opinions that could cause significant concern (see “not feeling the AGI/Singularity” in “3.6. Benefits of AGI”). And I have a nagging suspicion that if there’s ever a situation where the capability-maximizing thing to do would end up at odds with a recommendation from a published safety plan, the safety plan would be unceremoniously ignored/loopholed-around/amended. I think we saw instances of that already, and not only from OpenAI.
My current instinct is to just tune them out, on the assumption that the AGI lab in question (as opposed to the people writing the document) views them as just some nice-sounding non-binding PR.[1] Am I wrong to view it this way?
- ^
Poking holes in which is still important, kudos, Zvi.
- ^
Trying to evaluate this forecast in order to figure out how update on the newer one.
It certainly reads as surprisingly prescient. Notably, it predicts both the successes and the failures of the LLM paradigm: the ongoing discussion regarding how “shallow” or not their understanding is, the emergence of the reasoning paradigm, the complicated LLM bureaucracies/scaffolds, lots of investment in LLM-wrapper apps which don’t quite work, the relative lull of progress in 2024, troubles with agency and with generating new ideas, “scary AI” demos being dismissed because LLMs do all kinds of whimsical bullshit...
And it was written in the base-GPT-3 era, before ChatGPT, before even the Instruct models. I know I couldn’t have come close to calling any of this back then. Pretty wild stuff.
In comparison, the new “AI 2027” scenario is very… ordinary. Nothing that’s in it is surprising to me, it’s indeed the “default” “nothing new happens” scenario in many ways.
But perhaps the difference is in the eye of the beholder. Back in 2021, I barely knew how DL worked, forget being well-versed in deep LLM lore. The real question is, if I had been as immersed in the DL discourse in 2021 as I am now, would this counterfactual 2021!Thane have considered this forecast as standard as the AI 2027 forecast seems to 2025!Thane?
More broadly: “AI 2027” seems like the reflection of the default predictions regarding AI progress in certain well-informed circles/subcultures. Those circles/subcultures are fairly broad nowadays; e. g., significant parts of the whole AI Twitter. Back in 2021, the AI subculture was much smaller… But was there, similarly, an obviously maximally-well-informed fraction of that subculture which would’ve considered “What 2026 Looks Like” the somewhat-boring default prediction?
Reframing: @Daniel Kokotajlo, do you recall how wildly speculative you considered “What 2026 Looks Like” at the time of writing, and whether it’s more or less speculative than “AI 2027″ feels to you now? (And perhaps the speculativeness levels of the pre-2027 and post-2027 parts of the “AI 2027” report should be evaluated separately here.)
Another reframing: To what extent do you think your alpha here was in making unusually good predictions, vs. in paying attention to the correct things at a time when no-one focused on them, then making fairly basic predictions/extrapolations? (Which is important for evaluating how much your forecasts should be expected to “beat the (prediction) market” today, now that (some parts of) that market are paying attention to the right things as well.)
Notably, the trend in the last few years is that AI companies triple their revenue each year
Hm, I admittedly only skimmed the Compute Forecast article, but I don’t think there’s much evidence for a trend like this? The “triples every year” statement seems to be extrapolated from two data points about OpenAI specifically (“We use OpenAI’s 2023 revenue of $1B and 2024 revenue around $4B to to piece together a short term trend that we expect to slow down gradually”, plus maybe this). I guess you can draw a straight line through two points, and the idea of this trend following straight lines doesn’t necessarily seem unconvincing a-priori… But is there more data?
50% algorithmic progress
Yeah, I concur with all of that: some doubts about 50% in April 2026, some doubts about 13% today, but seems overall not implausible.
Excellent work!
Why our uncertainty increases substantially beyond 2026
Notably, it’s also the date at which my model diverges from this forecast’s. That’s surprisingly later than I’d expected.
Concretely,
OpenBrain doubles down on this strategy with Agent-2. It is qualitatively almost as good as the top human experts at research engineering (designing and implementing experiments), and as good as the 25th percentile OpenBrain scientist at “research taste” (deciding what to study next, what experiments to run, or having inklings of potential new paradigms).
I don’t know that the AGI labs in early 2027 won’t be on a trajectory to automate AI R&D. But I predict that a system trained the way Agent-2 is described to be trained here won’t be capable of the things listed.
I guess I’m also inclined to disagree with parts of the world-state predicted by early 2026, though it’s murkier on that front. Agent-1′s set of capabilities seems very plausible, but what I’m skeptical regarding are the economic and practical implications (AGI labs’ revenue tripling and 50% faster algorithmic progress). As in,
People naturally try to compare Agent-1 to humans, but it has a very different skill profile. It knows more facts than any human, knows practically every programming language, and can solve well-specified coding problems extremely quickly. On the other hand, Agent-1 is bad at even simple long-horizon tasks, like beating video games it hasn’t played before.
Does that not constitute just a marginal improvement on the current AI models? What’s the predicted phase shift that causes the massive economic implications and impact on research?
I assume it’s the jump from “unreliable agents” to “reliable agents” somewhere between 2025 to 2026. It seems kind of glossed over; I think that may be an earlier point at which I would disagree. Did I miss a more detailed discussion of it somewhere in the supplements?
The latest generation of thinking models can definitely do agentic frontend development
But does that imply that they’re general-purpose competent agentic programmers? The answers here didn’t seem consistent with that. Does your experience significantly diverge from that?
My current model is that it’s the standard “jagged capabilities frontier” on a per-task basis, where LLMs are good at some sufficiently “templated” projects, and then they fall apart on everything else. Their proficiency at frontend development is then mostly a sign of frontend code being relatively standardized[1]; not of them being sufficiently agent-y.
I guess quantifying it as “20% of the way from an amateur to a human pro” isn’t necessarily incorrect, depending on how you operationalize this number. But I think it’s also arguable that they haven’t actually 100%’d even amateur general-coding performance yet.
- ^
I. e., that most real-world frontend projects have incredibly low description length if expressed in the dictionary of some “frontend templates”, with this dictionary comprehensively represented in LLMs’ training sets.
(To clarify: These projects’ Kolmogorov complexity can still be high, but their cross-entropy relative to said dictionary is low.
Importantly, the cross-entropy relative to a given competent programmer’s “template-dictionary” can also be high, creating the somewhat-deceptive impression of LLMs being able to handle complex projects. But that apparent capability would then fail to generalize to domains in which real-world tasks aren’t short sentences in some pretrained dictionary. And I think we are observing that with e. g. nontrivial backend coding?)
- ^
Having a second Google account specifically for AI stuff seems like a straightforward solution to this? That’s what I do, at least. Switching between them is easy.
Technological progress leading to ever-better, ever-more-flexible communication technology, which serves as an increasingly more efficient breeding ground for ever-more-viral memes – and since virality is orthogonal to things like “long-term wisdom”, the society ends up taken over by unboundedly destructive ideas?
I mention this up top in an AI post despite all my efforts to stay out of politics, because in addition to torching the American economy and stock market and all of our alliances and trade relationships in general, this will cripple American AI in particular.
Are we in a survival-without-dignity timeline after all? Big if true.
(Inb4 we keep living in Nerd Hell and it somehow mysteriously fails to negatively impact AI in particular.)
Competitive agents will not choose to in order to beat the competition
Competitive agents will chose to commit suicide, knowing it’s suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn’t happen?
Are you quite sure the underlying issue here is not that the competitive agents don’t believe the suicide race to be a suicide race?
alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race
Off the top of my head, this post. More generally, this is an obvious feature of AI arms races in the presence of alignment tax. Here’s a 2011 writeup that lays it out:
Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided: development paths that seemed to pose a high risk of catastrophe could be relinquished in favor of safer ones. However, the context of an arms race might not permit such caution. A risk of accidental AI disaster would threaten all of humanity, while the benefits of being first to develop AI would be concentrated, creating a collective action problem insofar as tradeoffs between speed and safety existed.
I assure you the AI Safety/Alignment field has been widely aware of it since at least that long ago.
Also,
alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race
Any (human) system that is optimizing as hard as possible also won’t survive the race. Which hints at what the actual problem is: it’s not even that we’re in an AI arms race, it’s that we’re in an AI suicide race which the people racing incorrectly believe to be an AI arms race. Convincing people of the true nature of what’s happening is therefore a way to dissolve the race dynamic. Arms races are correct strategies to pursue under certain conditions; suicide races aren’t.
I’ve skimmed™ what I assume is your “main essay”. Thoughtless Kneejerk Reaction™ follows:
You are preaching to the choir. Most of it are 101-level arguments in favor of AGI risk. Basically everyone on LW has already heard them, and either agrees vehemently, or disagrees with some subtler point/assumption which your entry-level arguments don’t cover. The target audience for this isn’t LWers, this is not content that’s novel and useful for LWers. That may or may not be grounds for downvoting it (depending on one’s downvote philosophy), but is certainly grounds for not upvoting it and for not engaging with it.
The entry-level arguments have been reiterated here over and over and over and over again, and it’s almost never useful, and everyone’s sick of them, and you essay didn’t signal that engaging with you on them would be somehow unusually productive.
If I am wrong, prove me wrong: quote whatever argument of yours you think ranks the highest on novelty and importance, and I’ll evaluate it.
The focus on capitalism likely contributed to the “this is a shallow low-insight take” impression. The problem isn’t “capitalism”, it’s myopic competitive dynamics/Moloch in general. Capitalism exhibits lots of them, yes. But a bunch of socialist/communist states would fall into the same failure mode; a communist world government would fall into the same failure mode (inasmuch as it would still involve e. g. competition between researchers/leaders for government-assigned resources and prestige). Pure focus on capitalism creates the impression that you’re primarily an anti-capitalism ideologue who’s aiming co-opt the AGI risk for that purpose.
A useful take along those lines might be to argue that we can tap into the general public’s discontent with capitalism to more persuasively argue the case for the AGI risk, followed by an analysis regarding specific argument structures which would be both highly convincing and truthful.
Appending an LLM output at the end, as if it’s of inherent value, likely did you no favors.
I’m getting the impression that you did not familiarize yourself with LW’s culture and stances prior to posting. If yes, this is at the root of the problems you ran into.
Edit:
Imagine for a moment that an amateur astronomer spots an asteroid on a trajectory to wipe out humanity. He doesn’t have a PhD. He’s not affiliated with NASA. But the evidence is there. And when he contacts the people whose job it is to monitor the skies, they say: “Who are you to discover this?” And then refuse to even look in the direction he’s pointing.
A more accurate analogy would involve the amateur astronomer joining a conference for people discussing how to divert that asteroid, giving a presentation where he argues for the asteroid’s existence using low-resolution photos and hand-made calculations (to a room full of people who’ve observed the asteroid through the largest international telescopes or programmed supercomputer simulations of its trajectory), and is then confused why it’s not very well-received.
It’s been more than three months since o3 and still no o4, despite OpenAI researchers’ promises.
Deep Learning has officially hit a wall. Schedule the funeral.
[/taunting_god]
A new startup created specifically for the task. Examples: one, two.
Like, imagine that we actually did discover a non-DL AGI-complete architecture with strong safety guarantees, such that even MIRI would get behind it. Do you really expect that the project would then fail at the “getting funded”/”hiring personnel” stages?
tailcalled’s argument is the sole true reason: we don’t know of any neurosymbolic architecture that’s meaningfully safer than DL. (The people in the examples above are just adding to the AI-risk problem.) That said, I think the lack of alignment research going into it is a big mistake, mainly caused by the undertaking seeming too intimidating/challenging to pursue / by the streetlighting effect.