I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard. [1]
But I think there’s a larger issue. A lot of the discussion involve hostility to a given critic of AI “moving the goal posts”. As described, Model X(1) is introduced, critic notices limitation L(1), Model X(2) addresses and critics says they’re unconvinced and notes limitation L(2) and so-on. The critic of these critics says this approach is unfair, a bad argument, etc.
However, what the “moving the goal posts” objection misses, in my opinion, is the context of the claim that’s being made when someone says X(n) is generally intelligent. This claim isn’t about giving the creator of a model credit or an award. The claim is about whether a thing has a flexibility akin to that of a human being (especially the flexible, robust goal seeking ability of a human, an ability that could make a thing dangerous) and we don’t actually have a clear, exact formulation of what the flexible intelligence of a human consists of. The Turing Test might not be the best AGI test but it’s put in an open-ended fashion because there’s no codified set of “prove you’re like a human” questions.
Which is to say, Gary Marcus aside, if models keep advancing and if people keep finding new capacities that each model lacks, it will be perfectly reasonable to put the situation as “it’s not AGI yet” as long as these capacities are clearly significant capacities of human intelligence. There wouldn’t even need to be a set pattern to capacities critics cited. Again, it’s not about argument fairness etc, it’s that this sort of thing is all we have, for now, as a test of AGI.
I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.
Yeah this is the part that seems increasingly implausible to me. If there is a “class of problems that tend to be hard … [and] will continue to be hard,” then someone should be able to build a benchmark that models consistently struggle with over time.
I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard. [1]
But I think there’s a larger issue. A lot of the discussion involve hostility to a given critic of AI “moving the goal posts”. As described, Model X(1) is introduced, critic notices limitation L(1), Model X(2) addresses and critics says they’re unconvinced and notes limitation L(2) and so-on. The critic of these critics says this approach is unfair, a bad argument, etc.
However, what the “moving the goal posts” objection misses, in my opinion, is the context of the claim that’s being made when someone says X(n) is generally intelligent. This claim isn’t about giving the creator of a model credit or an award. The claim is about whether a thing has a flexibility akin to that of a human being (especially the flexible, robust goal seeking ability of a human, an ability that could make a thing dangerous) and we don’t actually have a clear, exact formulation of what the flexible intelligence of a human consists of. The Turing Test might not be the best AGI test but it’s put in an open-ended fashion because there’s no codified set of “prove you’re like a human” questions.
Which is to say, Gary Marcus aside, if models keep advancing and if people keep finding new capacities that each model lacks, it will be perfectly reasonable to put the situation as “it’s not AGI yet” as long as these capacities are clearly significant capacities of human intelligence. There wouldn’t even need to be a set pattern to capacities critics cited. Again, it’s not about argument fairness etc, it’s that this sort of thing is all we have, for now, as a test of AGI.
[1 ]https://garymarcus.substack.com/p/what-does-it-mean-when-an-ai-fails
Yeah this is the part that seems increasingly implausible to me. If there is a “class of problems that tend to be hard … [and] will continue to be hard,” then someone should be able to build a benchmark that models consistently struggle with over time.