I believe we should view AGI as a ratio of capability to resources, rather than simply asking how AI’s abilities compare to humans’. This view is becoming more common, but is not yet common enough.
When people discuss AI’s abilities relative to humans without considering the associated costs or time, this is like comparing fractions by looking only at the numerators.
In other words, AGI has a numerator (capability): what the AI system can achieve. This asks questions like: For this thing that a human can do, can AI do it too? How well can AI do it? (For example, on a set of programming challenges, how many can the AI solve? How many can a human solve?)
But also importantly, AGI should account for the denominator: how many resources are required to achieve its capabilities. Commonly, this resource will be a $-cost or an amount of time. This asks questions like “What is the $-cost of getting this performance?” or “How long does this task take to complete?”.
I claim that an AI system might fail to qualify as AGI if it lacks human-level capabilities, but it could also fail by being wildly inefficient compared to a human. Both the numerator and denominator matter when evaluating these ratios.
A quick example of why focusing solely on the capability (numerator) is insufficient:
Imagine an AI software engineer that can do most tasks human engineers do, but at 100–1000× the cost of a human.
I expect that lots of the predictions about “AGI” would not come due until (at least) that cost comes down substantially so that AI >= human on a capability-per-dollar basis.
For instance, it would not make sense to directly substitute AI for human labor at this ratio—but perhaps it does make sense to buy additional AI labor, if there were extremely valuable tasks for which human labor is the bottleneck today.
The AGI-as-ratio concept has long been implicit in some labs’ definitions of AGI—for instance, OpenAI describes AGI as “a highly autonomous system that outperforms humans at most economically valuable work”. Outperforming humans economically does seem to imply being more cost-effective, not just having the same capabilities. Yet even within OpenAI, the denominator aspect wasn’t always front of mind, which is why I wrote up a memo on this during my time there.
Until the recent discussions of o1 Pro / o3 drawing upon lots of inference compute, I rarely saw these ideas discussed, even in otherwise sophisticated analyses. One notable exception to this is my former teammate Richard Ngo’s t-AGI framework, which deserves a read. METR has also done a great job of accounting for this in their research comparing AI R&D performance, given a certain amount of time. I am glad to see more and more groups thinking in terms of these factors - but in casual analysis, it is very easy to just slip into comparisons of capability levels. This is worth pushing back on, imo: The time at which “AI capabilities = human capabilities” is different than the time when I expect AGI will have been achieved in the relevant senses.
There are also some important caveats to my claim here, that ‘comparing just by capability is missing something important’:
People reasonably expect AI to become cheaper over time, so if AI matches human capabilities but not cost, that might still signal ‘AGI soon.’ Perhaps this is what people mean when they say ‘o3 is AGI’.
Computers are much faster than humans for many tasks, and so one might believe that if an AI can achieve a thing it will quite obviously be faster than a human. This is less obvious now, however, because AI systems are leaning more on repeated sampling/selection procedures.
Comparative advantage is a thing, and so AI might have very large impacts even if it remains less absolutely capable than humans for many different tasks, if the cost/speed are good enough.
There are some factors that don’t fit super cleanly into this framework: things like AI’s ‘always-on availability’, which aren’t about capability per se, but probably belong in the numerator anyway? e.g., “How good is an AI therapist?” benefits from ‘you can message it around the clock’, which increases the utility of any given task-performance. (In this sense, maybe the ratio is best understood as utility-per-resource, rather than capability-per-resource.)
Human output is not linear to resources spent. Hiring 10 people costs you 10x as much as hiring 1, but it is not guaranteed that the output of their teamwork will be 10x greater. Sometimes each member of the team wants to do things differently, they have problem navigating each other’s code, etc.
So it could happen that “1 unit of AI” is more expensive and less capable than 1 human, and yet “10 units of AI” are more capable than 10 humans, and paying for “1000 units of AI” would be a fantastic deal, because as an average company you are unlikely to hire 1000 good programmers. Also, maybe the deal is that you pay for the AI only when you use it, but you cannot repeatedly hire and fire 1000 programmers.
I agree, these are interesting points, upvoted. I’d claim that AI output also isn’t linear with the resources—but nonetheless, that you’re right that the curve of marginal return from each AI unit could be different from each human unit in an important way. Likewise, the easier on-demand labor of AI is certainly a useful benefit.
I don’t think these contradict the thrust of my point though? That in each case, one shouldn’t just be thinking about usefulness/capability, but should also be considering the resources necessary for achieving this.
I agree that the resources matter. But I expect the resources-output curve to be so different from humans that even the AIs that spend a lot of resources will turn out to be useful in some critical things, probably the kind where we need many humans to cooperate.
But this is all just guessing on my end.
Also, I am not an expert, but it seems to me that in general, training the AI is expensive, using the AI is not. So if it already has the capability, it is likely to be relatively cheap.
I believe we should view AGI as a ratio of capability to resources, rather than simply asking how AI’s abilities compare to humans’. This view is becoming more common, but is not yet common enough.
When people discuss AI’s abilities relative to humans without considering the associated costs or time, this is like comparing fractions by looking only at the numerators.
In other words, AGI has a numerator (capability): what the AI system can achieve. This asks questions like: For this thing that a human can do, can AI do it too? How well can AI do it? (For example, on a set of programming challenges, how many can the AI solve? How many can a human solve?)
But also importantly, AGI should account for the denominator: how many resources are required to achieve its capabilities. Commonly, this resource will be a $-cost or an amount of time. This asks questions like “What is the $-cost of getting this performance?” or “How long does this task take to complete?”.
I claim that an AI system might fail to qualify as AGI if it lacks human-level capabilities, but it could also fail by being wildly inefficient compared to a human. Both the numerator and denominator matter when evaluating these ratios.
A quick example of why focusing solely on the capability (numerator) is insufficient:
Imagine an AI software engineer that can do most tasks human engineers do, but at 100–1000× the cost of a human.
I expect that lots of the predictions about “AGI” would not come due until (at least) that cost comes down substantially so that AI >= human on a capability-per-dollar basis.
For instance, it would not make sense to directly substitute AI for human labor at this ratio—but perhaps it does make sense to buy additional AI labor, if there were extremely valuable tasks for which human labor is the bottleneck today.
The AGI-as-ratio concept has long been implicit in some labs’ definitions of AGI—for instance, OpenAI describes AGI as “a highly autonomous system that outperforms humans at most economically valuable work”. Outperforming humans economically does seem to imply being more cost-effective, not just having the same capabilities. Yet even within OpenAI, the denominator aspect wasn’t always front of mind, which is why I wrote up a memo on this during my time there.
Until the recent discussions of o1 Pro / o3 drawing upon lots of inference compute, I rarely saw these ideas discussed, even in otherwise sophisticated analyses. One notable exception to this is my former teammate Richard Ngo’s t-AGI framework, which deserves a read. METR has also done a great job of accounting for this in their research comparing AI R&D performance, given a certain amount of time. I am glad to see more and more groups thinking in terms of these factors - but in casual analysis, it is very easy to just slip into comparisons of capability levels. This is worth pushing back on, imo: The time at which “AI capabilities = human capabilities” is different than the time when I expect AGI will have been achieved in the relevant senses.
There are also some important caveats to my claim here, that ‘comparing just by capability is missing something important’:
People reasonably expect AI to become cheaper over time, so if AI matches human capabilities but not cost, that might still signal ‘AGI soon.’ Perhaps this is what people mean when they say ‘o3 is AGI’.
Computers are much faster than humans for many tasks, and so one might believe that if an AI can achieve a thing it will quite obviously be faster than a human. This is less obvious now, however, because AI systems are leaning more on repeated sampling/selection procedures.
Comparative advantage is a thing, and so AI might have very large impacts even if it remains less absolutely capable than humans for many different tasks, if the cost/speed are good enough.
There are some factors that don’t fit super cleanly into this framework: things like AI’s ‘always-on availability’, which aren’t about capability per se, but probably belong in the numerator anyway? e.g., “How good is an AI therapist?” benefits from ‘you can message it around the clock’, which increases the utility of any given task-performance. (In this sense, maybe the ratio is best understood as utility-per-resource, rather than capability-per-resource.)
Human output is not linear to resources spent. Hiring 10 people costs you 10x as much as hiring 1, but it is not guaranteed that the output of their teamwork will be 10x greater. Sometimes each member of the team wants to do things differently, they have problem navigating each other’s code, etc.
So it could happen that “1 unit of AI” is more expensive and less capable than 1 human, and yet “10 units of AI” are more capable than 10 humans, and paying for “1000 units of AI” would be a fantastic deal, because as an average company you are unlikely to hire 1000 good programmers. Also, maybe the deal is that you pay for the AI only when you use it, but you cannot repeatedly hire and fire 1000 programmers.
I agree, these are interesting points, upvoted. I’d claim that AI output also isn’t linear with the resources—but nonetheless, that you’re right that the curve of marginal return from each AI unit could be different from each human unit in an important way. Likewise, the easier on-demand labor of AI is certainly a useful benefit.
I don’t think these contradict the thrust of my point though? That in each case, one shouldn’t just be thinking about usefulness/capability, but should also be considering the resources necessary for achieving this.
I agree that the resources matter. But I expect the resources-output curve to be so different from humans that even the AIs that spend a lot of resources will turn out to be useful in some critical things, probably the kind where we need many humans to cooperate.
But this is all just guessing on my end.
Also, I am not an expert, but it seems to me that in general, training the AI is expensive, using the AI is not. So if it already has the capability, it is likely to be relatively cheap.