I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.
Yeah this is the part that seems increasingly implausible to me. If there is a “class of problems that tend to be hard … [and] will continue to be hard,” then someone should be able to build a benchmark that models consistently struggle with over time.
Yeah this is the part that seems increasingly implausible to me. If there is a “class of problems that tend to be hard … [and] will continue to be hard,” then someone should be able to build a benchmark that models consistently struggle with over time.