Both? If you increase only one of the two the other becomes the bottleneck?
My impression based on talking to people at labs plus stuff I’ve read is that
Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
(Very open to correction by people closer to the big scaling labs).
My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
But what would that world look like? [...] I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
I think you’re mostly correct about current AI reseachers being able to usefully experiment with all the compute they have available.
I do think there are some considerations here though.
How closely are they adhering to the “main path” of scaling existing techniques with minor tweaks? If you want to know how a minor tweak affects your current large model at scale, that is a very compute-heavy researcher-time-light type of experiment. On the other hand, if you want to test a lot of novel new paths at much smaller scales, then you are in a relatively compute-light but researcher-time-heavy regime.
What fraction of the available compute resources is the company assigning to each of training/inference/experiments? My guess it that the current split is somewhere around 63/33/4. If this was true, and the company decided to pivot away from training to focus on experiments (0/33/67), this would be something like a 16x increase in compute for experiments. So maybe that changes the bottleneck?
We do indeed seem to be at “AGI for most stuff”, but with a spikey envelope of capability that leaves some dramatic failure modes. So it does make more sense to ask something like, “For remaining specific weakness X, what will the research agenda and timeline look like?”
This makes more sense then continuing to ask the vague “AGI complete” question when we are most of the way there already.
My impression based on talking to people at labs plus stuff I’ve read is that
Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
(Very open to correction by people closer to the big scaling labs).
My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
I think you’re mostly correct about current AI reseachers being able to usefully experiment with all the compute they have available.
I do think there are some considerations here though.
How closely are they adhering to the “main path” of scaling existing techniques with minor tweaks? If you want to know how a minor tweak affects your current large model at scale, that is a very compute-heavy researcher-time-light type of experiment. On the other hand, if you want to test a lot of novel new paths at much smaller scales, then you are in a relatively compute-light but researcher-time-heavy regime.
What fraction of the available compute resources is the company assigning to each of training/inference/experiments? My guess it that the current split is somewhere around 63/33/4. If this was true, and the company decided to pivot away from training to focus on experiments (0/33/67), this would be something like a 16x increase in compute for experiments. So maybe that changes the bottleneck?
We do indeed seem to be at “AGI for most stuff”, but with a spikey envelope of capability that leaves some dramatic failure modes. So it does make more sense to ask something like, “For remaining specific weakness X, what will the research agenda and timeline look like?”
This makes more sense then continuing to ask the vague “AGI complete” question when we are most of the way there already.