I more-or-less endorse the model described in larger language models may disappoint you [or, an eternally unfinished draft], and moreover I think language is an inherently lossy instrument such that the minimally-lossy model won’t have perfectly learned the causal processes or whatever behind its production.
I fully expect LLMs to hit a wall (if not now then in the future), but for any specific claims about timing, it’s worth remembering that people frequently claim it’s happening soon/has already happened, and will be wrong every time but one. Some past examples:
Facebook’s then-head of AI, December 2019: https://www.wired.com/story/facebooks-ai-says-field-hit-wall/
Gary Marcus, March 2022: https://nautil.us/deep-learning-is-hitting-a-wall-238440/
Donald Hobson, August 2022: https://www.lesswrong.com/posts/gqqhYijxcKAtuAFjL/a-data-limited-future
Epoch AI, November 2022 (estimating high-quality language data exhausted by 2024; in 2024 they updated their projection to 2028): https://epoch.ai/blog/will-we-run-out-of-ml-data-evidence-from-projecting-dataset
Will Eden, February 2023 (thread): https://x.com/WilliamAEden/status/1630690003830599680
Sam Altman, April 2023: https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
Justis Mills, June 2024: https://www.lesswrong.com/posts/axjb7tN9X2Mx4HzPz/the-data-wall-is-important (he cites Leopold Aschenbrenner for more detail, but Aschenbrenner himself is optimistic so I didn’t link directly)
Why do you think that LLMs will hit a wall in the future?
I more-or-less endorse the model described in larger language models may disappoint you [or, an eternally unfinished draft], and moreover I think language is an inherently lossy instrument such that the minimally-lossy model won’t have perfectly learned the causal processes or whatever behind its production.