RE-bench tasks (see page 7 here) are not the kind of AI research where you’re developing new AI paradigms and concepts. The tasks are much more straightforward than that. So your argument is basically assuming without argument that we can get to AGI with just the more straightforward stuff, as opposed to new AI paradigms and concepts.
If we do need new AI paradigms and concepts to get to AGI, then there would be a chicken-and-egg problem in automating AI research. Or more specifically, there would be two categories of AI R&D, with the less important R&D category (e.g. performance optimization and other REbench-type tasks) being automatable by near-future AIs, and the more important R&D category (developing new AI paradigms and concepts) not being automatable.
(Obviously you’re entitled to argue / believe that we don’t need need new AI paradigms and concepts to get to AGI! It’s a topic where I think reasonable people disagree. I’m just suggesting that it’s a necessary assumption for your argument to hang together, right?)
I disagree. I think the existing body of published computer science and neuroscience research are chock full of loose threads. Tons of potential innovations just waiting to be harvested by automated researchers. I’ve mentioned this idea elsewhere. I call it an ‘innovation overhang’.
Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low.
I think this means a better benchmark would consist of: taking two existing papers, finding a plausible hypothesis which combines the assumptions from the papers, designs and codes and runs tests, then reports on results.
So I don’t think “no new concepts” is a necessary assumption for getting to AGI quickly with the help of automated researchers.
Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low
Is this bottlenecked by programmer time or by compute cost?
Both? If you increase only one of the two the other becomes the bottleneck?
I agree this means that the decision to devote substantial compute to both inference and to assigning compute resources for running experiments designed by AI reseachers is a large cost. Presumably, as the competence of the AI reseachers gets higher, it feels easier to trust them not to waste their assigned experiment compute.
There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
So in order for there to be a big surge in AI R&D there’d need to be prioritization of that at a high level. This would be a change of direction from focusing primarily on scaling current techniques rapidly, and putting out slightly better products ASAP.
So yes, if you think that this priority shift won’t happen, then you should doubt that the increase in R&D speed my model predicts will occur.
But what would that world look like? Probably a world where scaling continues to pay dividends, and getting to AGI is more straightforward yhan Steve Byrnes or I expect.
I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
I argue that for AGI to be not-soon, you need both scaling to fail and for algorithm research to fail.
Both? If you increase only one of the two the other becomes the bottleneck?
My impression based on talking to people at labs plus stuff I’ve read is that
Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
(Very open to correction by people closer to the big scaling labs).
My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
But what would that world look like? [...] I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
It sounds like your disagreement isn’t with drawing a link from RE-bench to (forecasts for) automating research engineering, but is instead with thinking that you can get AGI shortly after automating research engineering due to AI R&D acceleration and already being pretty close. Is that right?
Note that the comment says research engineering, not research scientists.
RE-bench tasks (see page 7 here) are not the kind of AI research where you’re developing new AI paradigms and concepts. The tasks are much more straightforward than that. So your argument is basically assuming without argument that we can get to AGI with just the more straightforward stuff, as opposed to new AI paradigms and concepts.
If we do need new AI paradigms and concepts to get to AGI, then there would be a chicken-and-egg problem in automating AI research. Or more specifically, there would be two categories of AI R&D, with the less important R&D category (e.g. performance optimization and other REbench-type tasks) being automatable by near-future AIs, and the more important R&D category (developing new AI paradigms and concepts) not being automatable.
(Obviously you’re entitled to argue / believe that we don’t need need new AI paradigms and concepts to get to AGI! It’s a topic where I think reasonable people disagree. I’m just suggesting that it’s a necessary assumption for your argument to hang together, right?)
I disagree. I think the existing body of published computer science and neuroscience research are chock full of loose threads. Tons of potential innovations just waiting to be harvested by automated researchers. I’ve mentioned this idea elsewhere. I call it an ‘innovation overhang’. Simply testing interpolations and extrapolations (e.g. scaling up old forgotten ideas on modern hardware) seems highly likely to reveal plenty of successful new concepts, even if the hit rate per attempt is low. I think this means a better benchmark would consist of: taking two existing papers, finding a plausible hypothesis which combines the assumptions from the papers, designs and codes and runs tests, then reports on results.
So I don’t think “no new concepts” is a necessary assumption for getting to AGI quickly with the help of automated researchers.
Is this bottlenecked by programmer time or by compute cost?
Both? If you increase only one of the two the other becomes the bottleneck?
I agree this means that the decision to devote substantial compute to both inference and to assigning compute resources for running experiments designed by AI reseachers is a large cost. Presumably, as the competence of the AI reseachers gets higher, it feels easier to trust them not to waste their assigned experiment compute.
There was discussion on Dwarkesh Patel’s interview with researcher friends where there was mention that AI reseachers are already restricted by compute granted to them for experiments. Probably also on work hours per week they are allowed to spend on novel “off the main path” research.
So in order for there to be a big surge in AI R&D there’d need to be prioritization of that at a high level. This would be a change of direction from focusing primarily on scaling current techniques rapidly, and putting out slightly better products ASAP.
So yes, if you think that this priority shift won’t happen, then you should doubt that the increase in R&D speed my model predicts will occur.
But what would that world look like? Probably a world where scaling continues to pay dividends, and getting to AGI is more straightforward yhan Steve Byrnes or I expect.
I agree that that’s a substantial probability, but it’s also an AGI-soon sort of world.
I argue that for AGI to be not-soon, you need both scaling to fail and for algorithm research to fail.
My impression based on talking to people at labs plus stuff I’ve read is that
Most AI researchers have no trouble coming up with useful ways of spending all of the compute available to them
Most of the expense of hiring AI reseachers is compute costs for their experiments rather than salary
The big scaling labs try their best to hire the very best people they can get their hands on and concentrate their resources heavily into just a few teams, rather than trying to hire everyone with a pulse who can rub two tensors together.
(Very open to correction by people closer to the big scaling labs).
My model, then, says that compute availability is a constraint that binds much harder than programming or research ability, at least as things stand right now.
Sounds plausible to me. Especially since benchmarks encourage a focus on ability to hit the target at all rather than ability to either succeed or fail cheaply, which is what’s important in domains where the salary / electric bill of the experiment designer is an insignificant fraction of the total cost of the experiment.
Yeah, I expect it’s a matter of “dumb” scaling plus experimentation rather than any major new insights being needed. If scaling hits a wall that training on generated data + fine tuning + routing + specialization can’t overcome, I do agree that innovation becomes more important than iteration.
My model is not just “AGI-soon” but “the more permissive thresholds for when something should be considered AGI have already been met, and more such thresholds will fall in short order, and so we should stop asking when we will get AGI and start asking about when we will see each of the phenomena that we are using AGI as a proxy for”.
It sounds like your disagreement isn’t with drawing a link from RE-bench to (forecasts for) automating research engineering, but is instead with thinking that you can get AGI shortly after automating research engineering due to AI R&D acceleration and already being pretty close. Is that right?
Note that the comment says research engineering, not research scientists.