There is a sharp distinction between losing control (even if that doesn’t result in extinction), and delegation without losing control. It’s the distinction between literally permanent disempowerment and the opportunity to grow towards an option to eventually regain control over meaningful resources, where the future of humanity itself becomes superintelligent, at its own pace.
Vladimir_Nesov
Anthropic plausibly didn’t and don’t have enough TPUv7 yet. But the model is probably not tens of trillions of parameters, just notably bigger than Opus 4. And Opus 4 is sized for efficient serving with Trainium 2 racks, so maybe 3T params (Opus 5 probably won’t change that, since Trainium 2 remains an important part of the fleet). Thus 10T params would qualify for a larger-than-Opus weight class. The potential observation from this model I’m referring to is in whether further scaling above Opus results in meaningful improvement (Opus itself already demonstrated that scaling above Sonnet works), thus motivating further feasible scaling to continue beyond that point, towards tens of trillions of parameters that TPUv7/TPUv8 (and then Rubin Ultra Kyber racks) should be able to endure well enough.
There is no constraint of fitting models in scale-up worlds, it’s possible to serve a model across dozens of scale-up worlds. But if large parts of it do fit in one scale-up world or better yet it fits entirely with room for KV-cache to spare, it does wonders for efficiency. So a 10T param model on Trainium 2 is not a disaster compared to serving (or RLVRing) it on H100s, but it’s going to be even better with TPUv7. And for a 1T param model there might be no difference between Trainium 2 and TPUv7 (for reasonable levels of interactivity, beyond what the specs of the underlying chips suggest).
There likely won’t be an 8-rack scale-up Nvidia system ready for the 2027 buildout after all (contrary to what I speculated), and even for the 2028 buildout it won’t be offered in important quantities, suggests the SemiAnalysis report [1] posted after GTC 2026. If this is the case, then large scale-up worlds in the Nvidia buildout will follow the GTC 2025 timeline, with the first major change compared to GB200/GB300 Oberon racks (14/20 TB of HBM4) being the Rubin Ultra Kyber racks (147 TB of HBM4E), with the full-scale buildout in 2028.
Google TPUs will keep their advantage in hypothetical models with tens of trillions of parameters until 2028, and we might soon observe from Anthropic’s rumored larger-than-Opus Claude 5 model whether that’s likely to become an important class of models before 2028.
- ↩︎
Specifically, they say it’s 8x Oberon for racks with 72 Rubin Ultra packages, 4 compute dies per package, shipped starting in 2027 (rather than 72 non-Ultra Rubin packages, 2 compute dies per package, shipped starting in 2026). And that the multi-rack scale-up will be uncomfortably expensive, so unlikely to be shipped in volume. Since Kyber racks of Rubin Ultra should be out at about the same time, it doesn’t seem crucial that this system is offered at all, other than as an early peek at the Feynman 8x Kyber (NVL 1152) systems of 2028-2029.
- ↩︎
These are in any case distinct ideas, and one of them shouldn’t argue against the use of the other on historical grounds. The framing of the edits on the LW wiki is terrible, with retagging of existing posts and excision of the concept of “paperclip maximizer” in its actual form as it in fact came to be (even if not originally) from the record. I get the point that in some ways “paperclip maximizer” could be misleading, especially when intending to convey the lessons of “squiggle maximizer”, but it’s still its own thing with its own meaning and its own lessons.
If it’s the next level of pretraining compared to Opus 4 and Gemini 3 Pro, there’s potential for novel observations about what that does to the texture of capabilities. It’s the kind of thing that will predictably scale further soon without requiring algorithmic breakthroughs, and it’s not even clear that RLVR can be expected to deliver more phase changes in capabilities in the near future due to pure scaling than pretraining (even if it’s less than 1 phase change for either in expectation, until 2032 or so).
Introducing three notable classes of model sizes (Sonnet, Opus, above-Opus) is possibly the consequence of Anthropic needing to feed datacenters with three different classes of servers during the Claude 5 lifecycle: the smaller Nvidia 8-chip servers (H100/H200/B200), rack-scale Trainium 2, and TPUv7, each being able to serve larger models than the previous one efficiently. Meanwhile, OpenAI until very recently was stuck with mostly the 8-chip Nvidia servers and so had to use smaller models (they couldn’t serve their own Opus-class model efficiently), and only now they’re getting enough GB200/GB300 Oberon racks to offer an Opus-class flagship model soon. Though the Blackwell Oberon racks are better than Trainium 2, so there’s some advantage to what OpenAI will be able to serve compared to Opus 4, all else equal. And based on GPT-5.4 (which is likely in Sonnet’s weight class), currently OpenAI might be better at RLVRing capabilities than Anthropic for models of the same size, so OpenAI’s new Opus-class model might end up notably better than Opus 4. But by that time or a bit later Opus 5 will be released, so even if these considerations are on point, it’s still unclear which of them wins in the Opus weight class during late 2026.
Based on hardware considerations, I expect the prices per token for the above-Opus class of Anthropic model will start out high, maybe 4x the price of Opus 5 (which probably won’t change much compared to Opus 4), because they’d need to serve it on suboptimal hardware initially. And then the prices go down to maybe 2x the price of Opus 5 at the end of the year once the TPUv7 datacenters go online. This is what happened with Opus 4 over 2025, as Trainium 2 datacenters came online later in the year.
Like a method you have available as a human is learning on the task itself and AIs also have this option, but due to garbage sample efficiency it wouldn’t work.
Sample efficiency for RLVR is plausibly good enough if relevant tasks and RL environments can be automatically formulated, which is currently out of reach but it’s unclear for how long it stays this way. And for in-context learning, sample efficiency could get higher if it worked well enough (as learned learning, it’s in principle not constrained by what hardcoded learning can do), it just doesn’t work well with pretraining.
Dangerous capabilities well short of superintelligence are followed by overwhelmingly catastrophic capabilities 20 years later. But superintelligence being impossible, or its sudden emergence (in a matter of years) being impossible, is a position that makes whatever happens 20 years after a more modest milestone less relevant, because whatever happens a few years down the line (such as the state of alignment and control) is shaped by what’s done before that, and there’s time to figure things out.
Gradual disempowerment is a relevant argument within that framing. Disputing the framing involves arguments that sudden superintelligence is possible, or that eventual superintelligence is a phase change that preceding work won’t prepare for (rather than an arbitrary point in a gradual process not distinct from all other points). Disputing the framing is more difficult, but accepting this framing makes people much more tolerant of continuing unbounded development of increasingly capable AI at the pace that the technology itself is asking for. So these two models of AI danger are not very aligned on policy.
Acausal trade is not about trade (or pursuit of particular goals), it’s about expanding the scope of coordination. When things in multiple places, or at multiple times, or in multiple possibilities can jointly decide what to do, with each instance carrying out its part according to a shared policy, that’s coordination.
A given agent/person is mostly coordinated across the instances, at different points in time, and between different possibilities (different possible situations that might be mutually exclusive in one timeline or not). Exploiting existing coordination is best done with updateless policy selection, to the extent that’s possible. But this doesn’t help with establishing coordination in the first place, especially across multiple agents that originally didn’t think about each other (but could benefit from acting in concert). Logical updating (starting to listen to some computation that would influence a policy, perhaps a contract shared with other parties, which likely breaks some properties that updatelessness wants) has the character of enabling new coordination to be established, and the idea of acausal trade is gesturing at this, the kind of thing that happens in the Prisoner’s Dilemma.
It seems strangely difficult to introduce pursuit of particular goals in protocols for establishing coordination that work this way. Possibly this is a clue that the process of carrying out logical updating that establishes coordination (once it’s decided how to coordinate/update) should be distinct from the process of deciding how to coordinate/update in a way that pursues particular goals.
This is probably not very legible, so here’s a more concrete sketch. A contract (taking the form of a computation that its signatories will let influence their policies or thinking) should be chosen according to one’s values, but not based on the specific consequences of adhering to it (which will typically remain unknown at least until-in-logical-time everyone signs, and then the contract needs to take a look at what it’s dealing with). Once signed (which can be an updateful step for the signatories in some sense), a contract that is itself an agent can (updatelessly) exploit coordination among the signing parties built in terms of its behavior across their situations, (updatelessly) pursuing its own values, which can be distinct from those of the signatories that listen to its policy in concert. This policy can depend on who decides to sign it, which is how an individual potential signatory gets to influence the contract’s behavior, which in turn influences the behavior of the other signatories that would listen to this shared contract.
Pretraining memorizes all the facts in the world, but only gives weak fluid intelligence (in-context learning). RLVR trains crystallized intelligence that expresses itself in-context as strong fluid intelligence (which isn’t fake or illusory, within its scope), but this narrow strength falls apart sufficiently out of distribution (regressing to pretraining levels). Thus jaggedness (in fluid intelligence) is a good way of framing this, and currently the jaggedness profile is determined by RLVR training, the topics that were sufficiently covered in the RL training data. Humans are different in having a higher baseline of general fluid intelligence (than what pretraining gives LLMs), and thus in often possessing fluid intelligence that’s stronger than crystallized intelligence for the same topic, while LLMs always have strong crystallized intelligence for the topics where they have strong fluid intelligence (those covered by RLVR training).
General “smartness” might significantly improve from replacing pretraining with something more effective, applying RLVR much more broadly, or figuring out how to automatically train in response to post-deployment data (bringing it in-distribution for RLVR levels of in-context learning capability). None of these things are currently assured to be on track to happen quickly. The most straightforward step in this direction is automation of routine AI R&D that helps with applying RLVR during training to many more topics than is humanly feasible, expanding the areas covered by strong fluid intelligence. But even that plausibly runs into a wall of jaggedness in what kinds of things can be trained with RLVR, and of the absence of post-deployment training (at RLVR levels of capability).
So I think it remains plausible that “powerful AI” (that fully automates civilization) isn’t near. But also the scope of topics where AI is smart will increase in the next few years, and the rising tide of pretraining scale will improve the fallback level of smartness outside these topics (or for sufficiently novel constructions within these topics). It remains to be seen if this is sufficient to overcome the jaggedness of RLVR, either by sufficient competence at gluing narrow capabilities together (at pretraining levels of fluid intelligence), or by automating self-application of RLVR by LLMs to themselves that fixes gaps in RLVR-level capabilities for novel topics, as soon as they come up. Failing that, more breakthroughs may be required, which could take a hard-to-predict amount of time, making 10-20 years to “powerful AI” possible despite the current speed of progress.
I also don’t see an option to publish to Alignment Forum (from an LW draft). There is a “Move to Alignment” button in the menu for a post on the list of Drafts (the menu is under the three-dots-vertically button that appears on mouse-over on the draft post line, but otherwise doesn’t show). I guess it could be published there this way, but there’s nothing there about the EA Forum. Though I’m not sure if my account is considered registered on the EA forum (if this happens on its own by default), not being registered there might be the reason the option doesn’t show for me.
(There was a change in the post interface recently, and these features might’ve been lost. Though presence of the “Move to Alignment” thing could be the reason to get rid of an explicit checkbox when posting, the alternative method feels unnecessarily hard to find, so plausibly not downstream of a decision to make it this way, more like a consequence of practical difficulties. I only found it because I knew the feature existed previously, otherwise I might’ve assumed the feature is absent. Maybe I’m just failing to notice this in the new interface, where it should be obvious, like perhaps after pressing the “Publish” button there is an additional page, but since I’m not actually publishing anything now I can’t safely check.)
Cognition could have multiple parts that have nothing to do with each other, yet evolution (which has no mind at all) still found them. Thus it’s difficult to rule out that scaling some general learning method might find more of such parts, even when they aren’t found at lower levels of compute. While rapid scaling continues, it’s hard to see if there are genuine obstructions that won’t fall to feasible scale.
An LLM with a chain of thought is a general-purpose computer. In principle it could be implementing any cognitive algorithms, if the learning process gets the weights to do that. So a claim that an LLM can’t do something is a claim that the learning methods won’t train it to do it. With more scale, learning methods work better. And using RLVR, it might be possible to teach LLMs to experiment with methods for training themselves to do more things (implementing more parts of cognition in the weights), much more efficiently than evolution did to discover the human brain.
Nvidia to spend $26 billion to build open weight AI models. I am marking my Nvidia investments down to reflect a $26 billion lower target market cap. If Nvidia wants to invest in AI models they should invest in companies building AI models.
Nvidia is partnering with Mistral AI for at least some of this. Again, I would be investing instead, and not in Mistral.
Nvidia benefits from a broad ecosystem of smaller AI companies and API providers, in part so that a few giant AI companies won’t have too much negotiating power when buying Nvidia’s systems. Google has TPUs that are in some ways better, and AWS’s Trainium is a credible alternative (at least for internal use), so these companies already don’t need Nvidia as much as others do, and this could happen for the new giant AI companies as well at some point.
I’m starting to suspect that if 2026-2027 AGI happens through automation of routine AI R&D (automating acquisition of deep skills via RLVR), it doesn’t obviously accelerate ASI timelines all that much. Automated task and RL environment construction fixes some of the jaggedness, but LLMs are not currently particularly superhuman, and advancing their capabilities plausibly needs skills that aren’t easy for LLMs to automatically RLVR into themselves (as evidenced by humans not having made too much progress in RLVRing such skills).
This creates a strange future with broadly capable AGI that’s perhaps even somewhat capable of frontier AI R&D (not just routine AI R&D), but doesn’t accelerate further development beyond picking low-hanging algorithmic fruit unlocked by a given level of compute faster (months instead of years, but bounded by what the current compute makes straightforward). If this low-hanging algorithmic fruit doesn’t by itself lead to crucial breakthroughs, AGIs won’t turn broadly or wildly superhuman before there’s much more compute, or before a few years where human researchers would’ve made similar progress as these AGIs. And compute might remain gated by ASML EUV tools at 100-200 GW of new compute per year (3.5 tools occupied per GW of compute each year; maybe 250-300 EUV tools exist now, 50-100 will be produced per year, about 700 will exist in 2030).
Eight-rack Oberon scale-up worlds for Rubin might be in the works, which potentially makes them ready for models with tens of trillions of parameters one year earlier than Kyber racks would’ve made this efficient with Rubin Ultra, in 2027-2028 rather than in 2028-2029.
There were some unclear communications from GTC 2026 about a two-layer all-to-all NVLink scale-up world called “NVL576”. This seems to be a system comprising 8 non-Ultra Rubin Oberon (as in NVL72) racks (each with 144 compute dies in 72 2-die packages), so 576 packages (1152 compute dies) across 8 racks. It’s confusingly announced as “Vera Rubin Ultra NVL576 will combine eight … racks, each with 72 Rubin Ultra GPUs”. (Another slight confusion when searching about it is that in 2025 “NVL576” referred to a single Rubin Ultra Kyber rack with 576 compute dies in 144 4-die packages, but that’s clearly a different system from the “NVL576″ announced at GTC 2026.)
An 8-rack Rubin Oberon NVL576 system would have 165 TB of HBM4, so inferencing (and RLVRing) models with tens of trillions of parameters won’t be significantly less efficient than for models with trillions of parameters. This was TPUv7′s advantage (full buildout in 2026), and last year Nvidia only announced plans to close the gap with the Kyber rack for Rubin Ultra (576 compute dies in 144 4-die packages in one rack, 147 TB of HBM4E), which is due to come out in 2027, so that full-scale buildout would only conclude in 2028 (maybe early 2029). But Vera Rubin Oberon systems are already in production, and full-scale buildout will happen in 2027 (maybe early 2028, for some larger datacenter sites getting fully online).
So if these two-layer scale-up worlds for Oberon are available from the start, the constraint of HBM per scale-up world gets lifted a year earlier, which might translate into models with tens of trillions of parameters getting RLVRed and becoming available a year earlier on non-TPU systems. This might be especially crucial for OpenAI (if models this large can be made more capable than 10x smaller models in the relevant timeframe), since they are mostly working with Nvidia hardware, but even for Anthropic this might make a difference (they are getting 1 GW of TPUv7 in 2026, but it’s unclear if they’ll be able to get meaningfully more TPUs in 2027-2028).
(Having enough hardware to efficiently serve inference for a model of some shape is necessary to deploy it as a flagship model. If instead most of the available hardware is only good at serving smaller models, then even if the larger model can be trained, it can’t be served to most of the users as cheaply as the better hardware allows. This makes it less likely that it gets trained in the first place, and so the capabilities of the most popular hardware indirectly translate into the shapes of models that get trained in practice, even when it’s possible to train larger models in principle, and these larger models could still be served a bit slower and more expensively on older hardware.)
My timelines didn’t notably update overall (I wrote this comment before re-reading the comment you linked). Automation of routine AI R&D is an answer to the question of what specifically causes AGI/RSI in 2026-2027, if it happens this early, but in my model getting a clearer sense of what form this might take doesn’t make AGI in 2026-2027 more likely. Most of my AGI probability is in unknown breakthroughs or scaling outcomes (where quantity becomes quality in a capability that wouldn’t a priori obviously be able to go that far, before the necessary quantity actually arrives). These things are enabled by more compute and then either follow quickly (low-hanging fruit at a given level of compute, unlikely to be accessed earlier even if possible in principle) or take multiple years (when needing human-invented conceptual advancements). As compute grows faster/slower, this directly influences the probability of AGI/RSI per year during the few years after that.
I expect compute buildout (for individual AI companies) to continue at the current pace (of 2-4x more becoming available each year) in 2022-2029, perhaps with low-hanging fruit getting picked through 2032, then slower growth in 2029-2035, and even slower after that (absent AGI). Without AGI by 2035-2045, a lasting ban/pause gets more likely as global cultural attitudes might change. So the highest per-year probability is in 2027-2032, then notably lower in 2032-2038, and even lower after that. And I’m placing the median in 2032-2033. Which means 10% per year in 2027-2032, extending the first 10% to 2026-2027 since there’s some visibility into the very near future that says this probably isn’t happening right now.
New-for-me considerations from mid 2025 to now are a clearer picture of capabilities of RLVR and its implications for AI company revenues, and some details on what might happen around the Rubin Ultra buildout. Turns out RLVR works for IMO gold even with relatively small models (DeepSeek-V3), and there are now some LLM solutions to technical open problems, so it’s probably sufficient for training the deep skills aspect of AGI (it’s more than mere elicitation), especially with bigger models. Though jaggedness still makes it less useful than that suggests. This made 100 billion dollar revenues (2-5 GW training systems) for AI companies before 2030 more likely than o1/o3 suggested on their own. Scaffoldings like Claude Code, especially with better post-deployment adaptation (what’s being foreshadowed as “continual learning”), make even trillion dollar revenues before 2030-2032 plausible (which means 30-50 GW training systems, but it’ll take more time to scale the supply chains and actually build that with hardware of the same generation, as a single system for an individual AI company, probably closer to mid-2030s).
For the Rubin Ultra buildout (2028-2029), individual 5 GW systems don’t seem to be in the works, which previously seemed to suggest it already starts a slowdown in the trend, which then only lasts at the current pace during 2022-2026, and goes 2x slower in 2026-2029. (Trillion dollar revenues only extend the slower part of the trend, after the initial slowdown somewhere in 2026-2029, as the supply chains struggle to catch up to the available funding, and before compute mostly stops growing other than through improved price performance of hardware.) But Nvidia’s bet on FP8 in Rubin makes 2027-2029 hardware 2x-4x more performant per GW than I expected (2x from chips that are faster in FP8 because they no longer care about BF16 as much, and maybe another 2x from the confidence that FP8 is a first-class citizen in training of even the largest models, where this confidence wasn’t already priced in). So even 2 GW Rubin training systems remain on trend, even though the trend previously asked for 5 GW training systems for the same compute. The fact that Nvidia is making this bet means others will likely be doing the same, so this doesn’t necessarily only concern OpenAI.
There are sparks all over the place right now, sure. What matters is which of them get to contribute to wildfires, and which are lost in those wildfires started by completely different sparks, never given the opportinity to develop because of the order in which things happen. Not all sparks are sparks of wildfires. RSI is the wildfire, and not all sparks are relevant to RSI. Especially in actuality where something else happens first and ends their relevance, rather than in principle where in isolation and with enough resources they could be developed further.
Specificity is important when there are so many sparks. So I’m gesturing at a specific spark, automation of routine R&D, that plausibly might actually cause a wildfire before the other sparks grow similarly dangerous. Maybe it doesn’t catch, but in that case the other things still need to have a path towards learning of novel RLVR-level skills to have the potential, and many of them probably don’t, at least on their own.
Non-specific sparks matter for defense in depth, as in computer security where you harden the system against even the patently impossible interventions wherever it’s not too costly to do that. AI is existentially dangerous and nothing remotely close to the current methods can change that. But most sparks don’t matter for forecasting what is going to actually happen.
When RSI is not unbounded (doesn’t take steps that eventually proceed far beyond the modern civilization), it shouldn’t count as RSI. Some test-time training things might work better than pure in-context learning, but not all post-deployment improvement is RSI, and not all RSI must happen post-deployment.
The most near-term path to RSI that seems plausible to me is not about any sort of continual learning or scaffolding, but automation of routine AI R&D, because it gets to leverage RLVR, the only method of training a wide variety of deep skills that currently works with LLMs. Automating application of RLVR to LLMs (part of routine AI R&D) thus suggests the possibility of genuine RSI. On the other hand, all the post-deployment things don’t have the immediate potential to make LLMs able to play good chess, or to become fluent in a novel topic of math (that is, something not already trained into the LLM with RLVR before deployment). These things might help indirectly though, as part of automated routine AI R&D, and of training LLMs to get better at routine AI R&D.
Values need to be developed and decided, they aren’t already settled to merely be obeyed or pursued. If values are being developed not in accordance with the volition of their originators, they don’t have legitimacy. Originators of values must personally have a hand in developing them further in some way, and plausibly all they could do (in many possible situations) contributes to that to some extent. Thus values are in content closer to the world (or just a home, with more personal influence) you are living in and building (across many possibilities, to counteract path dependence), than to directives you must follow.
There is a very popular framing coloring all thinking of some people where seriously engaging with technological developments that are not immediately actionable is seen as deeply unvirtuous, and so the thought is never allowed proper consideration. Future that is not immediate is the immediate future’s responsibility, not your current self’s responsibility, and it’s irresponsible to be seriously concerned with it over the immediately actionable things you are working on, that you are directly affecting and need to get right.
Thus observable “alignment” of modern AIs, in the sense of their good behavior, is not just a reasonable disambiguatios of “alignment”, but the only one permitted by this stance. Being inclined to think that this helps in the long term doesn’t influence the outcome of seriously thinking only about current behavior. The claim that only long term consequences of behavior under RSI and society-scale development is what ultimately matters is not permitted to be taken seriously, it’s not the background assumption that justifies the focus on current behavior of modern AIs.
It’s not that such people don’t believe ASI is coming, or that it’s coming in their own lifetime, but the epistemic distortion of seeing serious engagement with unactionable things as intolerably unvirtuous makes their thinking and behavior indistinguishable from that of people who really believe ASI can never happen. This distortion can be pierced by belief that ASI is imminent, but once it’s plausibly a few years away it could as well be pure fiction. Exploratory engineering might also be helpful for detailed engagement, where assumptions of a thought experiment permit thinking. But outside the thought experiments these assumptions are then not going to be taken seriously as gesturing at the actual future that is virtuous to engage with as actual future.