A while ago I predicted that I think there’s a more likely than not chance Anthropic would run out of money trying to compete with OpenAI, Meta, and Deepmind (60%). At the time and now, it seems they still have no image video or voice generation unlike the others, and do not process image as well in inputs either.
OpenAI’s costs are reportedly at 8.5 billion. Despite being flush in cash from a recent funding round, they were allegedly at the brink of bankruptcy and required a new, even larger, funding round. Anthropic does not have the same deep pockets as the other players. Big tech like apple who are not deeply invested in AI seem to be wary of investing in OpenAI. It stands to reason, Amazon may be as well. It is looking more likely that Anthropic will be left in the dust (80%),
The only winning path I see is a new more compute efficient architecture emerges, they are first, and they manage to kick of RSI before more funded competitors rush in to copy them. Since this seems unlikely I think they are not going to fare well.
Frontier model training requires that you build the largest training system yourself, because there is no such system already available for you to rent time on. Currently Microsoft builds these systems for OpenAI, and Amazon for Anthropic, and it’s Microsoft and Amazon that own these systems, so OpenAI and Anthropic don’t pay for them in full. Google, xAI and Meta build their own.
Models that are already deployed hold about 5e25 FLOPs and need about 15K H100s to be trained in a few months. These training systems cost about $700 million to build. Musk announced that the Memphis cluster got 100K H100s working in Sep 2024, OpenAI reportedly got a 100K H100s cluster working in May 2024, and Zuckerberg recently said that Llama 4 will be trained on over 100K GPUs. These systems cost $4-5 billion to build and we’ll probably start seeing 5e26 FLOPs models trained on them starting this winter. OpenAI, Anthropic, and xAI each had billions invested in them, some of it in compute credits for the first two, so the orders of magnitude add up. This is just training, more goes to inference, but presumably the revenue covers that part.
There are already plans to scale to 1 gigawatt by the end of next year, both for Google and for Microsoft, which in the latter case is 500K B200s across multiple sites, which should require about $30-50 billion. Possibly we’ll start seeing the first 7e27 FLOPs models (about 300x original GPT-4) in second half of 2026 (or maybe they’ll be seeing us). So for now, OpenAI has no ability to escape Microsoft’s patronage, because it can’t secure enough funding in time to start catching up with the next level of scale. And Microsoft is motivated to keep sponsoring OpenAI according to what the current level of scaling demands, as long as it’s willing to build the next frontier training system.
So far, the yearly capital expenditures of Microsoft Azure, Google Cloud Platform, and Amazon Web Services are about $50 billion each, which includes their development across the whole world, so 2025 is going to start stressing their budgets. Also, I’m not aware of what’s going on with Amazon for 2025 and 1 gigawatt clusters (or even 2024 and 100K H100s clusters), and Musk mentioned plans for 300K B200s by summer 2025.
Indeed. Although interesting that Sonnet 3.5 now accepts images as input, just doesn’t produce them. I expect that they could produce images, but are choosing to restrict that capability.
Haiku still doesn’t accept image input. My guess is that this is for efficiency reasons.
I’m curious if they’re considering releasing an even smaller, cheaper, faster, and more efficient model line than Haiku. I’d appreciate that, personally.
I was worried about Anthropic for a bit before the 3.0 Claude series came out. But then seeing how much better Opus 3 was than GPT-4, I switched to thinking they had a chance. And thought so even more after Sonnet 3.5 came out and was better, or almost as good, as Opus 3 at nearly everything.
I do agree they seem behind in terms of a lot of the things other than nlp and safety. I don’t think that they need to catch up on those things to be first-to-RSI. So I think it’s going to depend a lot on how well they focus on that key research, versus getting off-track trying to catch up on non-critical-path stuff.
A lot of difference could be made by key researchers, rather than just big funding. I believe that more efficient algorithms exist to be found, and so efficiency will increase fast once RSI starts.
In the past 6 months or so I’ve become more convinced that Anthropic pulling ahead would be really good for the world. I’ve started thinking hard about ways I could help make this happen. Maybe there are things that their engineers just don’t have time to experiment with, weird long-shot stuff, which outside researchers could explore and then only share their successful results with Anthropic? If enough researchers did that, it’d be like buying a bunch of lottery tickets for them.
My read of the events. Anthropic is trying to raise money and rushed out a half baked model.
3.5 opus has not yet had the desired results. 3.5 sonnet, being easier to iterate on, was tuned to beat OpenAI’s model on some arbitrary benchmarks in an effort to wow investors.
With the failed run of Opus, they presumably tried to get o1 like reasoning results or some agentic breakthrough. The previous 3.5s was also particularly good because of a fluke of the training run rng (same as gpt4-0314), which makes it harder for iterations to beat it.
They are probably now rushing to scale inference time compute. I wonder if they tried doing something with steering vectors initially for 3.5 opus.
I doubt very much that it is something along the lines of random seeds that have made the difference between quality of various Sonnet 3.x runs. I expect it’s much more like them experimenting with different datasets (including different sorts of synthetic data).
As for Opus 3.5, Dario keeps saying that it’s in the works and will come out eventually. The way he says this does seem like they’ve either hit some unexpected snag (as you imply) or that they deprioritized this because they are short on resources (engineers and compute) and decided it was better to focus on improving their smaller models. The recent release of a new Sonnet 3.5 and Haiku 3.5 nudges me in the direction of thinking that they’ve chosen to prioritize smaller models. The reasons they made this choice are unclear. Has Opus 3.5 had work put in, but turned out disappointing so far? Does the inference cost (including opportunity cost of devoting compute resources to inefficient inference) make the economics seem unfavorable, even though actually Opus 3.5 is working pretty well? (Probably not overwhelmingly well, or they’d likely find some way to show it off even if they didn’t open it up for public API access.)
Are they rushing now to scale inference-time compute in o1/deepseek style? Almost certainly, they’d be crazy not to. Probably they’d done some amount of this internally already for generating higher quality synthetic data of reasoning traces. I don’t know how soon we should expect to see a public facing version of their inference-time-compute-scaled experiments though. Maybe they’ll decide to just keep it internal for a while, and use it to help train better versions of Sonnet and Haiku? (Maybe also a private internal version of Opus, which in turn helps generate better synthetic data?)
It’s all so hard to guess at, I feel quite uncertain.
A while ago I predicted that I think there’s a more likely than not chance Anthropic would run out of money trying to compete with OpenAI, Meta, and Deepmind (60%). At the time and now, it seems they still have no image video or voice generation unlike the others, and do not process image as well in inputs either.
OpenAI’s costs are reportedly at 8.5 billion. Despite being flush in cash from a recent funding round, they were allegedly at the brink of bankruptcy and required a new, even larger, funding round. Anthropic does not have the same deep pockets as the other players. Big tech like apple who are not deeply invested in AI seem to be wary of investing in OpenAI. It stands to reason, Amazon may be as well. It is looking more likely that Anthropic will be left in the dust (80%),
The only winning path I see is a new more compute efficient architecture emerges, they are first, and they manage to kick of RSI before more funded competitors rush in to copy them. Since this seems unlikely I think they are not going to fare well.
Frontier model training requires that you build the largest training system yourself, because there is no such system already available for you to rent time on. Currently Microsoft builds these systems for OpenAI, and Amazon for Anthropic, and it’s Microsoft and Amazon that own these systems, so OpenAI and Anthropic don’t pay for them in full. Google, xAI and Meta build their own.
Models that are already deployed hold about 5e25 FLOPs and need about 15K H100s to be trained in a few months. These training systems cost about $700 million to build. Musk announced that the Memphis cluster got 100K H100s working in Sep 2024, OpenAI reportedly got a 100K H100s cluster working in May 2024, and Zuckerberg recently said that Llama 4 will be trained on over 100K GPUs. These systems cost $4-5 billion to build and we’ll probably start seeing 5e26 FLOPs models trained on them starting this winter. OpenAI, Anthropic, and xAI each had billions invested in them, some of it in compute credits for the first two, so the orders of magnitude add up. This is just training, more goes to inference, but presumably the revenue covers that part.
There are already plans to scale to 1 gigawatt by the end of next year, both for Google and for Microsoft, which in the latter case is 500K B200s across multiple sites, which should require about $30-50 billion. Possibly we’ll start seeing the first 7e27 FLOPs models (about 300x original GPT-4) in second half of 2026 (or maybe they’ll be seeing us). So for now, OpenAI has no ability to escape Microsoft’s patronage, because it can’t secure enough funding in time to start catching up with the next level of scale. And Microsoft is motivated to keep sponsoring OpenAI according to what the current level of scaling demands, as long as it’s willing to build the next frontier training system.
So far, the yearly capital expenditures of Microsoft Azure, Google Cloud Platform, and Amazon Web Services are about $50 billion each, which includes their development across the whole world, so 2025 is going to start stressing their budgets. Also, I’m not aware of what’s going on with Amazon for 2025 and 1 gigawatt clusters (or even 2024 and 100K H100s clusters), and Musk mentioned plans for 300K B200s by summer 2025.
It’s possible that “not doing image, video or voice” is exactly what you need to create a more compute-efficient architecture.
Indeed. Although interesting that Sonnet 3.5 now accepts images as input, just doesn’t produce them. I expect that they could produce images, but are choosing to restrict that capability.
Haiku still doesn’t accept image input. My guess is that this is for efficiency reasons.
I’m curious if they’re considering releasing an even smaller, cheaper, faster, and more efficient model line than Haiku. I’d appreciate that, personally.
I was worried about Anthropic for a bit before the 3.0 Claude series came out. But then seeing how much better Opus 3 was than GPT-4, I switched to thinking they had a chance. And thought so even more after Sonnet 3.5 came out and was better, or almost as good, as Opus 3 at nearly everything.
I do agree they seem behind in terms of a lot of the things other than nlp and safety. I don’t think that they need to catch up on those things to be first-to-RSI. So I think it’s going to depend a lot on how well they focus on that key research, versus getting off-track trying to catch up on non-critical-path stuff.
A lot of difference could be made by key researchers, rather than just big funding. I believe that more efficient algorithms exist to be found, and so efficiency will increase fast once RSI starts.
In the past 6 months or so I’ve become more convinced that Anthropic pulling ahead would be really good for the world. I’ve started thinking hard about ways I could help make this happen. Maybe there are things that their engineers just don’t have time to experiment with, weird long-shot stuff, which outside researchers could explore and then only share their successful results with Anthropic? If enough researchers did that, it’d be like buying a bunch of lottery tickets for them.
https://x.com/arcprize/status/1849225898391933148?s=46&t=lZJAHzXMXI1MgQuyBgEhgA
My read of the events. Anthropic is trying to raise money and rushed out a half baked model.
3.5 opus has not yet had the desired results. 3.5 sonnet, being easier to iterate on, was tuned to beat OpenAI’s model on some arbitrary benchmarks in an effort to wow investors.
With the failed run of Opus, they presumably tried to get o1 like reasoning results or some agentic breakthrough. The previous 3.5s was also particularly good because of a fluke of the training run rng (same as gpt4-0314), which makes it harder for iterations to beat it.
They are probably now rushing to scale inference time compute. I wonder if they tried doing something with steering vectors initially for 3.5 opus.
I doubt very much that it is something along the lines of random seeds that have made the difference between quality of various Sonnet 3.x runs. I expect it’s much more like them experimenting with different datasets (including different sorts of synthetic data).
As for Opus 3.5, Dario keeps saying that it’s in the works and will come out eventually. The way he says this does seem like they’ve either hit some unexpected snag (as you imply) or that they deprioritized this because they are short on resources (engineers and compute) and decided it was better to focus on improving their smaller models. The recent release of a new Sonnet 3.5 and Haiku 3.5 nudges me in the direction of thinking that they’ve chosen to prioritize smaller models. The reasons they made this choice are unclear. Has Opus 3.5 had work put in, but turned out disappointing so far? Does the inference cost (including opportunity cost of devoting compute resources to inefficient inference) make the economics seem unfavorable, even though actually Opus 3.5 is working pretty well? (Probably not overwhelmingly well, or they’d likely find some way to show it off even if they didn’t open it up for public API access.)
Are they rushing now to scale inference-time compute in o1/deepseek style? Almost certainly, they’d be crazy not to. Probably they’d done some amount of this internally already for generating higher quality synthetic data of reasoning traces. I don’t know how soon we should expect to see a public facing version of their inference-time-compute-scaled experiments though. Maybe they’ll decide to just keep it internal for a while, and use it to help train better versions of Sonnet and Haiku? (Maybe also a private internal version of Opus, which in turn helps generate better synthetic data?)
It’s all so hard to guess at, I feel quite uncertain.