I’m worried that it will be hard to govern inference-time compute scaling.
My (rather uninformed) sense is that “AI governance” is mostly predicated on governing training and post-training compute, with the implicit assumption that scaling these will lead to AGI (and hence x-risk).
However, the paradigm has shifted to scaling inference-time compute. And I think this will be much harder to effectively control, because 1) it’s much cheaper to just run a ton of queries on a model as opposed to training a new one from scratch (so I expect more entities to be able to scale inference-time compute) and 2) inference can probably be done in a distributed way without requiring specialized hardware (so it’s much harder to effectively detect / prevent).
Tl;dr the old assumption of ‘frontier AI models will be in the hands of a few big players where regulatory efforts can be centralized’ doesn’t seem true anymore.
Are there good governance proposals for inference-time compute?
I think it depends on whether or not the new paradigm is “training and inference” or “inference [on a substantially weaker/cheaper foundation model] is all you need.” My impression so far is that it’s more likely to be the former (but people should chime in).
If I were trying to have the most powerful model in 2027, it’s not like I would stop scaling. I would still be interested in using a $1B+ training run to make a more powerful foundation model and then pouring a bunch of inference into that model.
But OK, suppose I need to pause after my $1B+ training run because I want to a bunch of safety research. And suppose there’s an entity that has a $100M training run model and is pouring a bunch of inference into it. Does the new paradigm allow the $100M people to “catch up” to the $1B people through inference alone?
My impression is that the right answer here is “we don’t know.” So I’m inclined to think that it’s still quite plausible that you’ll have ~3-5 players at the frontier and that it might still be quite hard for players without a lot of capital to keep up. TBC I have a lot of uncertainty here.
Are there good governance proposals for inference-time compute?
So far, I haven’t heard (or thought of) anything particularly unique. It seems like standard things like “secure model weights” and “secure compute//export controls” still apply. Perhaps it’s more important to strive for hardware-enabled mechanisms that can implement rules like “detect if XYZ inference is happening; if it is, refuse to run and notify ABC party.”
And in general, perhaps there’s an update toward flexible HEMs and toward flexible proposals in general. Insofar as o3 is (a) actually an important and durable shift in how frontier AI progress occurs and (b) surprised people, it seems like this should update (at least somewhat) against the “we know what’s happening and here are specific ideas based on specific assumptions” model and toward the view: “no one really understands AI progress and we should focus on things that seem robustly good. Things like raising awareness, increasing transparency into frontier AI development, increasing govt technical expertise, advancing the science of evals, etc.”
(On the flip side, perhaps o3 is an update toward shorter timelines. If so, the closer we get toward systems that pose national security risks, the more urgent it will be for the government to Make Real Decisions TM and decide whether or not it decides to be involved in AI development in a stronger way. I continue to think that preparing concrete ideas/proposals for this scenario seems quite important.)
Caveat: All these takes are loosely held. Like many people, I’m still orienting to what o3 really means for AI governance/policy efforts. Would be curious for takes on this from folks like @Zvi, @davekasten, @ryan_greenblatt, @Dan H, @gwern, @Jeffrey Ladish, or others.
By several reports, (e.g. here and here) OpenAI is throwing enormous amounts of training compute at o-series models. And if the new RL paradigm involves more decentralized training compute than the pretraining paradigm, that could lead to more consolidation into a few players, not less, because pretraining* is bottlenecked by the size of the largest cluster. E.g. OpenAI’s biggest single compute cluster is similar in size to xAI’s, even though OpenAI has access to much more compute overall. But if it’s just about who has the most compute then the biggest players will win.
*though pretraining will probably shift to distributed training eventually
I think those governance proposals were worse than worthless anyway. They didn’t take into account rapid algorithmic advancement in peak capabilities and in training and inference efficiency. If this helps the governance folks shake off some of their myopic hopium, so much the better.
In my world model, it is more governable and reduce the x-risk for several reasons (if timelines are short).
1-Pretraining a big model and run millions of it at the same time give us a very fast takeoff speed and way less time for preparation, and iteration compared to 50000 slow and compute heavy models.
2-their deployment(pre-AGI O3 type systems) to the public will eat away a huge amount of compute, and may delay pretraining of larger models if they are profitable enough.
3-it is way easier to check the capabilities and shortcomings this way.
I’m worried that it will be hard to govern inference-time compute scaling.
My (rather uninformed) sense is that “AI governance” is mostly predicated on governing training and post-training compute, with the implicit assumption that scaling these will lead to AGI (and hence x-risk).
However, the paradigm has shifted to scaling inference-time compute. And I think this will be much harder to effectively control, because 1) it’s much cheaper to just run a ton of queries on a model as opposed to training a new one from scratch (so I expect more entities to be able to scale inference-time compute) and 2) inference can probably be done in a distributed way without requiring specialized hardware (so it’s much harder to effectively detect / prevent).
Tl;dr the old assumption of ‘frontier AI models will be in the hands of a few big players where regulatory efforts can be centralized’ doesn’t seem true anymore.
Are there good governance proposals for inference-time compute?
I think it depends on whether or not the new paradigm is “training and inference” or “inference [on a substantially weaker/cheaper foundation model] is all you need.” My impression so far is that it’s more likely to be the former (but people should chime in).
If I were trying to have the most powerful model in 2027, it’s not like I would stop scaling. I would still be interested in using a $1B+ training run to make a more powerful foundation model and then pouring a bunch of inference into that model.
But OK, suppose I need to pause after my $1B+ training run because I want to a bunch of safety research. And suppose there’s an entity that has a $100M training run model and is pouring a bunch of inference into it. Does the new paradigm allow the $100M people to “catch up” to the $1B people through inference alone?
My impression is that the right answer here is “we don’t know.” So I’m inclined to think that it’s still quite plausible that you’ll have ~3-5 players at the frontier and that it might still be quite hard for players without a lot of capital to keep up. TBC I have a lot of uncertainty here.
So far, I haven’t heard (or thought of) anything particularly unique. It seems like standard things like “secure model weights” and “secure compute//export controls” still apply. Perhaps it’s more important to strive for hardware-enabled mechanisms that can implement rules like “detect if XYZ inference is happening; if it is, refuse to run and notify ABC party.”
And in general, perhaps there’s an update toward flexible HEMs and toward flexible proposals in general. Insofar as o3 is (a) actually an important and durable shift in how frontier AI progress occurs and (b) surprised people, it seems like this should update (at least somewhat) against the “we know what’s happening and here are specific ideas based on specific assumptions” model and toward the view: “no one really understands AI progress and we should focus on things that seem robustly good. Things like raising awareness, increasing transparency into frontier AI development, increasing govt technical expertise, advancing the science of evals, etc.”
(On the flip side, perhaps o3 is an update toward shorter timelines. If so, the closer we get toward systems that pose national security risks, the more urgent it will be for the government to Make Real Decisions TM and decide whether or not it decides to be involved in AI development in a stronger way. I continue to think that preparing concrete ideas/proposals for this scenario seems quite important.)
Caveat: All these takes are loosely held. Like many people, I’m still orienting to what o3 really means for AI governance/policy efforts. Would be curious for takes on this from folks like @Zvi, @davekasten, @ryan_greenblatt, @Dan H, @gwern, @Jeffrey Ladish, or others.
By several reports, (e.g. here and here) OpenAI is throwing enormous amounts of training compute at o-series models. And if the new RL paradigm involves more decentralized training compute than the pretraining paradigm, that could lead to more consolidation into a few players, not less, because pretraining* is bottlenecked by the size of the largest cluster. E.g. OpenAI’s biggest single compute cluster is similar in size to xAI’s, even though OpenAI has access to much more compute overall. But if it’s just about who has the most compute then the biggest players will win.
*though pretraining will probably shift to distributed training eventually
I think those governance proposals were worse than worthless anyway. They didn’t take into account rapid algorithmic advancement in peak capabilities and in training and inference efficiency. If this helps the governance folks shake off some of their myopic hopium, so much the better.
Related comment
In my world model, it is more governable and reduce the x-risk for several reasons (if timelines are short).
1-Pretraining a big model and run millions of it at the same time give us a very fast takeoff speed and way less time for preparation, and iteration compared to 50000 slow and compute heavy models.
2-their deployment(pre-AGI O3 type systems) to the public will eat away a huge amount of compute, and may delay pretraining of larger models if they are profitable enough.
3-it is way easier to check the capabilities and shortcomings this way.
I’ve had similar thoughts previously: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=rSDHH4emZsATe6ckF.