Likewise, LLMs are produced by a relatively simple training process (minimizing loss on next-token prediction, using a large training set from the internet, Github, Wikipedia etc.) but the resulting 175 billion parameter model is extremely inscrutable.
So the author is confusing the training process with the model. It’s like saying “although it may appear that humans are telling jokes and writing plays, all they are actually doing is optimizing for survival and reproduction”. This fallacy occurs throughout the paper.
The train/test framework is not helpful for understanding this. The dynamical system view is more useful (though beware that this starts to get close to the term “emergent behavior” which we must be wary of). The interesting thing about chaos is that, while the behavior is not perfectly predictable, maybe even surprising, it has well-defined properties and mathematical constraints. Everything is not possible. The Lorenz System has finite support. In the same spirit, we need to take a step back and realize that the kind of “real AI” that people are afraid of would require causal modeling which is mathematically impossible to construct using correlation only. If the model is able to start making interventions in the world, then we need to consider the possibility that it will be able to construct a casual model. But this goes beyond predicting the next word, which is the scope of this article.
What I’m arguing is that what LLMs does go way beyond predicting the next word. That’s just the proximal means to an end, which is a coherent statement.
The train/test framework is not helpful for understanding this. The dynamical system view is more useful (though beware that this starts to get close to the term “emergent behavior” which we must be wary of). The interesting thing about chaos is that, while the behavior is not perfectly predictable, maybe even surprising, it has well-defined properties and mathematical constraints. Everything is not possible. The Lorenz System has finite support. In the same spirit, we need to take a step back and realize that the kind of “real AI” that people are afraid of would require causal modeling which is mathematically impossible to construct using correlation only. If the model is able to start making interventions in the world, then we need to consider the possibility that it will be able to construct a casual model. But this goes beyond predicting the next word, which is the scope of this article.
What I’m arguing is that what LLMs does go way beyond predicting the next word. That’s just the proximal means to an end, which is a coherent statement.