I think the term “AGI” is a bit of a historical artifact, it was coined before the deep learning era when previous AI winters had made everyone in the field reluctant to think they could make any progress toward general intelligence. Instead, all AI had to be very extensively hand-crafted to the application in question. And then some people felt like they still wanted to do research on what the original ambition of AI had been, and wanted a term that’d distinguish them from all the other people who said they were doing “AI”.
So it was a useful term to distinguish yourself from the very-narrow AI research back then, but now that AI systems are already increasingly general, it doesn’t seem like a very useful concept anymore and it’d be better to talk in terms of more specific cognitive capabilities that a system has or doesn’t have.
> now that AI systems are already increasingly general
I want to point out that if you tried to quantify this properly, the argument falls apart (at least in my view). “All AI systems are increasingly general” would be false; there are still many useful but very narrow AI systems. “Some AI systems” would be true, but this highlights the continuing usefulness of the distinction.
One way out of this would be to declare that only LLMs and their ilk count as “AI” now, with more narrow machine learning just being statistics or something. I don’t like this because of the commonality of methods between LLMs and the rest of ML; it is still deep learning (and in many cases, transformers), just scaled down in every way.
Hmm I guess that didn’t properly convey what I meant. More like, LLMs are general in a sense, but in a very weird sense where they can perform some things at a PhD level while simultaneously failing at some elementary-school level problems. You could say that they are not “general as in capable of learning widely runtime” but “general as in they can be trained to do an immensely wide set of tasks at training-time”.
And this is then a sign that the original concept is no longer very useful, since okay LLMs are “general” in a sense. But probably if you’d told most people 10 years ago that “we now have AIs that you can converse with in natural language about almost any topic, they’re expert programmers and they perform on a PhD level in STEM exams”, that person would not have expected you to follow up with “oh and the same systems repeatedly lose at tic-tac-toe without being able to figure out what to do about it”.
So now we’re at a point where it’s like “okay our AIs are ‘general’, but general does not seem to mean what we thought it would mean, instead of talking about whether AIs are ‘general’ or not we should come up with more fine-grained distinctions like ‘how good are they at figuring out novel stuff at runtime’, and maybe the whole thing about ‘human-level intelligence’ does not cut reality at the joints very well and we should instead think about what capabilities are required to make an AI system dangerous”.
A while ago I wrote a post on why I think a “generality” concept can be usefully distinguished from an “intelligence” concept. Someone with a PhD is, I argue, not more general than a child, just more intelligent. Moreover, I would even argue that humans are a lot more intelligent than chimpanzees, but hardly more general. More broadly, animals seem to be highly general, just sometimes quite unintelligent.
For example, they (we) are able to do predictive coding: being able to predict future sensory inputs in real-time and react to them with movements, and learn from wrong predictions. This allows animals to be quite directly embedded in physical space and time (which solves “robotics”), instead of relying on a pretty specific and abstract API (like text tokens) that is not even real-time. Current autoregressive transformers can’t do that.
An intuition for this is the following: If we could make an artificial mouse-intelligence, we likely could, quite easily, scale this model to human-intelligence and beyond. Because the mouse brain doesn’t seem architecturally or functionally very different from a human brain. It’s just small. This suggests that mice are general intelligences (nonA-GIs) like us. They are just not very smart. Like a small language model that has the same architecture as a larger one.
A more subtle point: Predictive coding means learning from sensory data, and from trying to predict sensory data. The difference between predicting sensory data and human-written text is that the former are, pretty directly, created by the physical world, while existing text is constrained by how intelligent the humans were that wrote this text. So language models merely imitate humans via predicting their text, which leads to diminishing returns, while animals (humans) predict physical reality quite directly, which doesn’t have a similar ceiling. So scaling up a mouse-like AGI would likely quickly be followed by an ASI, while scaling up pretrained language models has lead to diminishing returns once their text gets as smart as the humans who wrote it, as diminishing results with Orion and other recent frontier base models have shown. Yes, scaling CoT reasoning is another approach to improve LLMs, but this is more like teaching a human how to think for longer rather than making them more intelligent.
And then some people felt like they still wanted to do research on what the original ambition of AI had been, and wanted a term that’d distinguish them from all the other people who said they were doing “AI”.
And then at some point all the latter people switched to saying “machine learning” instead.
I think the point is kind of that what matter is not what specific cognitive capabilities it has, but whether whatever set it has is, in total, enough to allow it to address a sufficiently broad class of problems, more or less equivalent to what a human can do. It doesn’t matter how it does it.
Right, but I’m not sure if that’s a particularly important question to focus on. It is important in the sense that if an AI could do that, then it would definitely be an existential risk. But AI could also become a serious risk while having a very different kind of cognitive profile from humans. E.g. I’m currently unconvinced about short AI timelines—I thought the arguments for short timelines that people gave when I asked were pretty weak—and I expect that in the near future we’re more likely to get AIs that continue to have a roughly LLM-like cognitive profile.
And I also think it would be a mistake to conclude from this that existential risk from AI is in the near future is insignificant, since an “LLM-like intelligence” might still become very very powerful in some domains while staying vastly below the human level in others. But if people only focus on “when will we have AGI”, this point risks getting muddled, when it would be more important to discuss something to do “what capabilities do we expect AIs to have in the future, what tasks would those allow the AIs to do, and what kinds of actions would that imply”.
I’m confused, why does that make the term no longer useful? There’s still a large distinction between companies focusing on developing AGI (OpenAI, Anthropic, etc.) vs those focusing on more ‘mundane’ advancements (Stability, Black Forest, the majority of ML research results). Though I do disagree that it was only used to distinguish them from narrow AI. Perhaps that was what it was originally, but it quickly turned into the roughly “general intelligence like a smart human” approximate meaning we have today.
I agree ‘AGI’ has become an increasingly vague term, but that’s because it is a useful distinction and so certain groups use it to hype. I don’t think abandoning a term because it is getting weakened is a great idea.
We should talk more about specific cognitive capabilities, but that isn’t stopped by us using the term AGI, it is stopped by not having people analyzing whether X is an important capability for risk or capability for stopping risk.
I think the term “AGI” is a bit of a historical artifact, it was coined before the deep learning era when previous AI winters had made everyone in the field reluctant to think they could make any progress toward general intelligence. Instead, all AI had to be very extensively hand-crafted to the application in question. And then some people felt like they still wanted to do research on what the original ambition of AI had been, and wanted a term that’d distinguish them from all the other people who said they were doing “AI”.
So it was a useful term to distinguish yourself from the very-narrow AI research back then, but now that AI systems are already increasingly general, it doesn’t seem like a very useful concept anymore and it’d be better to talk in terms of more specific cognitive capabilities that a system has or doesn’t have.
> now that AI systems are already increasingly general
I want to point out that if you tried to quantify this properly, the argument falls apart (at least in my view). “All AI systems are increasingly general” would be false; there are still many useful but very narrow AI systems. “Some AI systems” would be true, but this highlights the continuing usefulness of the distinction.
One way out of this would be to declare that only LLMs and their ilk count as “AI” now, with more narrow machine learning just being statistics or something. I don’t like this because of the commonality of methods between LLMs and the rest of ML; it is still deep learning (and in many cases, transformers), just scaled down in every way.
Hmm I guess that didn’t properly convey what I meant. More like, LLMs are general in a sense, but in a very weird sense where they can perform some things at a PhD level while simultaneously failing at some elementary-school level problems. You could say that they are not “general as in capable of learning widely runtime” but “general as in they can be trained to do an immensely wide set of tasks at training-time”.
And this is then a sign that the original concept is no longer very useful, since okay LLMs are “general” in a sense. But probably if you’d told most people 10 years ago that “we now have AIs that you can converse with in natural language about almost any topic, they’re expert programmers and they perform on a PhD level in STEM exams”, that person would not have expected you to follow up with “oh and the same systems repeatedly lose at tic-tac-toe without being able to figure out what to do about it”.
So now we’re at a point where it’s like “okay our AIs are ‘general’, but general does not seem to mean what we thought it would mean, instead of talking about whether AIs are ‘general’ or not we should come up with more fine-grained distinctions like ‘how good are they at figuring out novel stuff at runtime’, and maybe the whole thing about ‘human-level intelligence’ does not cut reality at the joints very well and we should instead think about what capabilities are required to make an AI system dangerous”.
A while ago I wrote a post on why I think a “generality” concept can be usefully distinguished from an “intelligence” concept. Someone with a PhD is, I argue, not more general than a child, just more intelligent. Moreover, I would even argue that humans are a lot more intelligent than chimpanzees, but hardly more general. More broadly, animals seem to be highly general, just sometimes quite unintelligent.
For example, they (we) are able to do predictive coding: being able to predict future sensory inputs in real-time and react to them with movements, and learn from wrong predictions. This allows animals to be quite directly embedded in physical space and time (which solves “robotics”), instead of relying on a pretty specific and abstract API (like text tokens) that is not even real-time. Current autoregressive transformers can’t do that.
An intuition for this is the following: If we could make an artificial mouse-intelligence, we likely could, quite easily, scale this model to human-intelligence and beyond. Because the mouse brain doesn’t seem architecturally or functionally very different from a human brain. It’s just small. This suggests that mice are general intelligences (nonA-GIs) like us. They are just not very smart. Like a small language model that has the same architecture as a larger one.
A more subtle point: Predictive coding means learning from sensory data, and from trying to predict sensory data. The difference between predicting sensory data and human-written text is that the former are, pretty directly, created by the physical world, while existing text is constrained by how intelligent the humans were that wrote this text. So language models merely imitate humans via predicting their text, which leads to diminishing returns, while animals (humans) predict physical reality quite directly, which doesn’t have a similar ceiling. So scaling up a mouse-like AGI would likely quickly be followed by an ASI, while scaling up pretrained language models has lead to diminishing returns once their text gets as smart as the humans who wrote it, as diminishing results with Orion and other recent frontier base models have shown. Yes, scaling CoT reasoning is another approach to improve LLMs, but this is more like teaching a human how to think for longer rather than making them more intelligent.
And then at some point all the latter people switched to saying “machine learning” instead.
I think the point is kind of that what matter is not what specific cognitive capabilities it has, but whether whatever set it has is, in total, enough to allow it to address a sufficiently broad class of problems, more or less equivalent to what a human can do. It doesn’t matter how it does it.
Right, but I’m not sure if that’s a particularly important question to focus on. It is important in the sense that if an AI could do that, then it would definitely be an existential risk. But AI could also become a serious risk while having a very different kind of cognitive profile from humans. E.g. I’m currently unconvinced about short AI timelines—I thought the arguments for short timelines that people gave when I asked were pretty weak—and I expect that in the near future we’re more likely to get AIs that continue to have a roughly LLM-like cognitive profile.
And I also think it would be a mistake to conclude from this that existential risk from AI is in the near future is insignificant, since an “LLM-like intelligence” might still become very very powerful in some domains while staying vastly below the human level in others. But if people only focus on “when will we have AGI”, this point risks getting muddled, when it would be more important to discuss something to do “what capabilities do we expect AIs to have in the future, what tasks would those allow the AIs to do, and what kinds of actions would that imply”.
I’m confused, why does that make the term no longer useful? There’s still a large distinction between companies focusing on developing AGI (OpenAI, Anthropic, etc.) vs those focusing on more ‘mundane’ advancements (Stability, Black Forest, the majority of ML research results). Though I do disagree that it was only used to distinguish them from narrow AI. Perhaps that was what it was originally, but it quickly turned into the roughly “general intelligence like a smart human” approximate meaning we have today.
I agree ‘AGI’ has become an increasingly vague term, but that’s because it is a useful distinction and so certain groups use it to hype. I don’t think abandoning a term because it is getting weakened is a great idea.
We should talk more about specific cognitive capabilities, but that isn’t stopped by us using the term AGI, it is stopped by not having people analyzing whether X is an important capability for risk or capability for stopping risk.
Do my two other comments [1, 2] clarify that?