AIs are only capable of doing tasks that took 1-10 hours in 2024 (60%)
To me, these two are kind of hard to reconcile. Once we have AI doing 10 hour tasks (especially in AGI labs), the rate at which work gets done by the employees will probably be at least 5x of what it is today. How hard is it to hit the singularity after that point? I certainly don’t think it’s less than 15% likely to happen within the months or years after this happens.
Also, keep in mind that the capabilities of internal models will be higher than the capabilities of deployed models. So by the time we have 1-10 hour models deployed in the world, the AGI labs might have 10-100 hour models.
I agree that <15% seems too low for most reasonable definitions of 1-10 hours and the singularity. But I’d guess I’m more sympathetic than you, depending on the definitions Nathan had in mind.
I think both of the phrases “AI capable doing tasks that took 1-10 hours” and “hit the singularity” are underdefined and making them more clear could lead to significantly different probabilities here.
For “capable of doing tasks that took 1-10 hours in 2024”:
If we’re saying that “AI can do every cognitive task that takes a human 1-10 hours in 2024 as well as (edit: the best)a human expert”, I agree it’s pretty clear we’re getting extremely fast progress at that point not least because AI will be able to do the vast majority of tasks that take much longer than that by the time it can do all of 1-10 hour tasks.
Also, it seems like the distribution of relevant cognitive tasks that you care about changes a lot on different time horizons, which further complicates things.
Re: “hit the singularity”, I think in general there’s little agreement on a good definition here e.g. the definition in Tom’s report is based on doubling time of “effective compute in 2022-FLOP” shortening after “full automation”, which I think is unclear what it corresponds to in terms of real-world impact as I think both of these terms are also underdefined/hard to translate into actual capability and impact metrics.
I would be curious to hear the definitions you and Nathan had in mind regarding these terms.
I also guess that the less training data there is, the less good the AIs will be. So while the maybe be good at setting up a dropshipping website for shoes (a 1 − 10 hour task) they may not be good at alignment research.
To me the singularity is when things are undeniably zooming, or perhaps even have zoomed. New AI tech is coming out daily or perhaps the is even godlike AGI. What do folks think is a reasonable definition?
For “capable of doing tasks that took 1-10 hours in 2024”, I was imagining an AI that’s roughly as good as a software engineer that gets paid $100k-$200k a year.
For “hit the singularity”, this one is pretty hazy, I think I’m imagining that the metaculus AGI question has resolved YES, and that the superintelligence question is possibly also resolved YES. I think I’m imagining a point where AI is better than 99% of human experts at 99% of tasks. Although I think it’s pretty plausible that we could enter enormous economic growth with AI that’s roughly as good as humans at most things (I expect the main thing stopping this to be voluntary non-deployment and govt. intervention).
If AIs of the near future can’t do good research (and instead are only proficient in concepts that have significant presence in datasets), singularity remains bottlenecked by human research speed. The way such AIs speed things up is through their commercial success making more investment in scaling possible, not directly (and there is little use for them in the lab). It’s currently unknown if scaling even at $1 trillion level is sufficient by itself, so some years of Futurama don’t seem impossible, especially as we are only talking 2029.
I think that AIs will be able to do 10 hours of research (at the level of a software engineer that gets paid $100k a year) within 4 years with 50% probability.
If we look at current systems, there’s not much indication that AI agents will be superhuman in non-AI-research tasks and subhuman in AI research tasks. One of the most productive uses of AI so far has been in helping software engineers code better, so I’d wager AI assistants will be even more helpful for AI research than for other things (compared to some prior based on those task’s “inherent difficulties”). Additionally, AI agents can do some basic coding using proper codebases and projects, so I think scaffolded GPT-5 or GPT-6 will likely be able to do much more than GPT-4.
That’s the crux of this scenario, whether current AIs with near future improvements can do research. If they can, with scaling they only do it better. If they can’t, scaling might fail to help, even if they become agentic and therefore start generating serious money. That’s the sense in which AIs capable of 10 hours of work don’t lead to game-changing acceleration of research, by remaining incapable of some types of work.
What seems inevitable at the moment is AIs gainingworldmodels where they can reference any concepts that frequently come up in the training data. This promises proficiency in arbitrary routine tasks, but not necessarily construction of novel ideas that lack sufficient footprint in the datasets. Ability to understand such ideas in-context when explained seems to be increasing with LLM scale though, and might be crucial for situational awareness needed for becoming agentic, as every situation is individually novel.
Note that AI doesn’t need to come up with original research ideas or do much original thinking to speed up research by a bunch. Even if it speeds up the menial labor of writing code, running experiments, and doing basic data analysis at scale, if you free up 80% of your researchers’ time, your researchers can now spend all of their time doing the important task, which means overall cognitive labor is 5x faster. This is ignoring effects from using your excess schlep-labor to trade against non-schlep labor leading to even greater gains in efficiency.
I think that, ignoring pauses or government intervention, the point at which AGI labs internally have AIs that are capable of doing 10 hours of R&D related tasks (software engineering, running experiments, analyzing data, etc.), then the amount of effective cognitive labor per unit time being put into AI research will probably go up by at least 5x compared to current rates.
Imagine the current AGI capabilities employee’s typical work day. Now imagine they had an army of AI assisstants that can very quickly do 10 hours worth of their own labor. How much more productive is that employee compared to their current state? I’d guess at least 5x. See section 6 of Tom Davidson’s takeoff speeds framework for a model.
That means by 1 year after this point, an equivalent of at least 5 years of labor will have been put into AGI capabilities research. Physical bottlenecks still exist, but is it really that implausible that the capabilities workforce would stumble upon huge algorithmic efficiency improvements? Recall that current algorithms are much less efficient than the human brain. There’s lots of room to go.
The modal scenario I imagine for a 10-hour-AI scenario is that once such an AI is available internally, the AGI lab uses it to speed up its workforce by many times. That sped up workforce soon (within 1 year) achieves algorithmic improvements which put AGI within reach. The main thing stopping them from reaching AGI in this scenario would be a voluntary pause or government intervention.
Physical bottlenecks still exist, but is it really that implausible that the capabilities workforce would stumble upon huge algorithmic efficiency improvements? Recall that current algorithms are much less efficient than the human brain. There’s lots of room to go.
I don’t understand the reasoning here. It seems like you’re saying “Well, there might be compute bottlenecks, but we have so much room left to go in algorithmic improvements!” But the room to improve point is already the case right now, and seems orthogonal to the compute bottlenecks point.
E.g. if compute bottlenecks are theoretically enough to turn the 5x cognitive labor into only 1.1x overall research productivity, it will still be the case that there is lots of room for improvement but the point doesn’t really matter as research productivity hasn’t sped up much. So to argue that the situation has changed dramatically you need to argue something about how big of a deal the compute bottlenecks will in fact be.
I was more making the point that, if we enter a regime where AI can do 10 hour SWE tasks, then this will result in big algorithmic improvements, but at some point pretty quickly effective compute improvements will level out because of physical compute bottlenecks. My claim is that the point at which it will level out will be after multiple years worth of current algorithmic progress had been “squeezed out” of the available compute.
Interesting, thanks for clarifying. It’s not clear to me that this is the right primary frame to think about what would happen, as opposed to just thinking first about how big compute bottlenecks are and then adjusting the research pace for that (and then accounting for diminishing returns to more research).
I think a combination of both perspectives is best, as the argument in your favor for your frame is that there will be some low-hanging fruit from changing your workflow to adapt to the new cognitive labor.
Imagine the current AGI capabilities employee’s typical work day. Now imagine they had an army of AI assisstants that can very quickly do 10 hours worth of their own labor. How much more productive is that employee compared to their current state? I’d guess at least 5x. See section 6 of Tom Davidson’s takeoff speeds framework for a model.
Can you elaborate how you’re translating 10-hour AI assistants into a 5x speedup using Tom’s CES model?
My reasoning is something like: roughly 50-80% of tasks are automatable with AI that can do 10 hours of software engineering, and under most sensible parameters this results in at least 5x of speedup. I’m aware this is kinda hazy and doesn’t map 1:1 with the CES model though
To me, these two are kind of hard to reconcile. Once we have AI doing 10 hour tasks (especially in AGI labs), the rate at which work gets done by the employees will probably be at least 5x of what it is today. How hard is it to hit the singularity after that point? I certainly don’t think it’s less than 15% likely to happen within the months or years after this happens.
Also, keep in mind that the capabilities of internal models will be higher than the capabilities of deployed models. So by the time we have 1-10 hour models deployed in the world, the AGI labs might have 10-100 hour models.
I agree that <15% seems too low for most reasonable definitions of 1-10 hours and the singularity. But I’d guess I’m more sympathetic than you, depending on the definitions Nathan had in mind.
I think both of the phrases “AI capable doing tasks that took 1-10 hours” and “hit the singularity” are underdefined and making them more clear could lead to significantly different probabilities here.
For “capable of doing tasks that took 1-10 hours in 2024”:
If we’re saying that “AI can do every cognitive task that takes a human 1-10 hours in 2024 as well as (edit: the best)
ahuman expert”, I agree it’s pretty clear we’re getting extremely fast progress at that point not least because AI will be able to do the vast majority of tasks that take much longer than that by the time it can do all of 1-10 hour tasks.However, if we’re using a weaker definition like the one Richard used on most cognitive tasks, it beats most human experts who are given 1-10 hours to perform the task, I think it’s much less clear due to human interaction bottlenecks.
Also, it seems like the distribution of relevant cognitive tasks that you care about changes a lot on different time horizons, which further complicates things.
Re: “hit the singularity”, I think in general there’s little agreement on a good definition here e.g. the definition in Tom’s report is based on doubling time of “effective compute in 2022-FLOP” shortening after “full automation”, which I think is unclear what it corresponds to in terms of real-world impact as I think both of these terms are also underdefined/hard to translate into actual capability and impact metrics.
I would be curious to hear the definitions you and Nathan had in mind regarding these terms.
Yeah I was trying to use richard’s terms.
I also guess that the less training data there is, the less good the AIs will be. So while the maybe be good at setting up a dropshipping website for shoes (a 1 − 10 hour task) they may not be good at alignment research.
To me the singularity is when things are undeniably zooming, or perhaps even have zoomed. New AI tech is coming out daily or perhaps the is even godlike AGI. What do folks think is a reasonable definition?
For “capable of doing tasks that took 1-10 hours in 2024”, I was imagining an AI that’s roughly as good as a software engineer that gets paid $100k-$200k a year.
For “hit the singularity”, this one is pretty hazy, I think I’m imagining that the metaculus AGI question has resolved YES, and that the superintelligence question is possibly also resolved YES. I think I’m imagining a point where AI is better than 99% of human experts at 99% of tasks. Although I think it’s pretty plausible that we could enter enormous economic growth with AI that’s roughly as good as humans at most things (I expect the main thing stopping this to be voluntary non-deployment and govt. intervention).
Yeah that sounds about right. A junior dev who needs to be told to do individual features.
You’re hit thi singularity doesn’t sound wrong but I’ll need to think
If AIs of the near future can’t do good research (and instead are only proficient in concepts that have significant presence in datasets), singularity remains bottlenecked by human research speed. The way such AIs speed things up is through their commercial success making more investment in scaling possible, not directly (and there is little use for them in the lab). It’s currently unknown if scaling even at $1 trillion level is sufficient by itself, so some years of Futurama don’t seem impossible, especially as we are only talking 2029.
I think that AIs will be able to do 10 hours of research (at the level of a software engineer that gets paid $100k a year) within 4 years with 50% probability.
If we look at current systems, there’s not much indication that AI agents will be superhuman in non-AI-research tasks and subhuman in AI research tasks. One of the most productive uses of AI so far has been in helping software engineers code better, so I’d wager AI assistants will be even more helpful for AI research than for other things (compared to some prior based on those task’s “inherent difficulties”). Additionally, AI agents can do some basic coding using proper codebases and projects, so I think scaffolded GPT-5 or GPT-6 will likely be able to do much more than GPT-4.
That’s the crux of this scenario, whether current AIs with near future improvements can do research. If they can, with scaling they only do it better. If they can’t, scaling might fail to help, even if they become agentic and therefore start generating serious money. That’s the sense in which AIs capable of 10 hours of work don’t lead to game-changing acceleration of research, by remaining incapable of some types of work.
What seems inevitable at the moment is AIs gaining world models where they can reference any concepts that frequently come up in the training data. This promises proficiency in arbitrary routine tasks, but not necessarily construction of novel ideas that lack sufficient footprint in the datasets. Ability to understand such ideas in-context when explained seems to be increasing with LLM scale though, and might be crucial for situational awareness needed for becoming agentic, as every situation is individually novel.
Note that AI doesn’t need to come up with original research ideas or do much original thinking to speed up research by a bunch. Even if it speeds up the menial labor of writing code, running experiments, and doing basic data analysis at scale, if you free up 80% of your researchers’ time, your researchers can now spend all of their time doing the important task, which means overall cognitive labor is 5x faster. This is ignoring effects from using your excess schlep-labor to trade against non-schlep labor leading to even greater gains in efficiency.
I think that, ignoring pauses or government intervention, the point at which AGI labs internally have AIs that are capable of doing 10 hours of R&D related tasks (software engineering, running experiments, analyzing data, etc.), then the amount of effective cognitive labor per unit time being put into AI research will probably go up by at least 5x compared to current rates.
Imagine the current AGI capabilities employee’s typical work day. Now imagine they had an army of AI assisstants that can very quickly do 10 hours worth of their own labor. How much more productive is that employee compared to their current state? I’d guess at least 5x. See section 6 of Tom Davidson’s takeoff speeds framework for a model.
That means by 1 year after this point, an equivalent of at least 5 years of labor will have been put into AGI capabilities research. Physical bottlenecks still exist, but is it really that implausible that the capabilities workforce would stumble upon huge algorithmic efficiency improvements? Recall that current algorithms are much less efficient than the human brain. There’s lots of room to go.
The modal scenario I imagine for a 10-hour-AI scenario is that once such an AI is available internally, the AGI lab uses it to speed up its workforce by many times. That sped up workforce soon (within 1 year) achieves algorithmic improvements which put AGI within reach. The main thing stopping them from reaching AGI in this scenario would be a voluntary pause or government intervention.
I don’t understand the reasoning here. It seems like you’re saying “Well, there might be compute bottlenecks, but we have so much room left to go in algorithmic improvements!” But the room to improve point is already the case right now, and seems orthogonal to the compute bottlenecks point.
E.g. if compute bottlenecks are theoretically enough to turn the 5x cognitive labor into only 1.1x overall research productivity, it will still be the case that there is lots of room for improvement but the point doesn’t really matter as research productivity hasn’t sped up much. So to argue that the situation has changed dramatically you need to argue something about how big of a deal the compute bottlenecks will in fact be.
I was more making the point that, if we enter a regime where AI can do 10 hour SWE tasks, then this will result in big algorithmic improvements, but at some point pretty quickly effective compute improvements will level out because of physical compute bottlenecks. My claim is that the point at which it will level out will be after multiple years worth of current algorithmic progress had been “squeezed out” of the available compute.
Interesting, thanks for clarifying. It’s not clear to me that this is the right primary frame to think about what would happen, as opposed to just thinking first about how big compute bottlenecks are and then adjusting the research pace for that (and then accounting for diminishing returns to more research).
I think a combination of both perspectives is best, as the argument in your favor for your frame is that there will be some low-hanging fruit from changing your workflow to adapt to the new cognitive labor.
Can you elaborate how you’re translating 10-hour AI assistants into a 5x speedup using Tom’s CES model?
My reasoning is something like: roughly 50-80% of tasks are automatable with AI that can do 10 hours of software engineering, and under most sensible parameters this results in at least 5x of speedup. I’m aware this is kinda hazy and doesn’t map 1:1 with the CES model though