The error margin is larger than it appears as estimated from tokens, due to combinatorial complexity. There are many paths that the output can take and still produce an acceptable answer, but an LLM needs to stay consistent depending on which path is chosen. The next level is to combine paths consistently, such that the dependence of one path on another is correctly learned. This means you get a latent space where the LLM learns the representation of the world using paths and paths between paths. With other words, it learns the topology of the world and is able to map this topology onto different spaces, which appear as conversations with humans.
I’m not sure I understand correctly: you are saying that it’s easier for the LLM to stay coherent in the relevant sense than it looks from LeCun’s argument, right?
The error margin LeCun used is for independent probabilities, while in an LLM the paths that the output takes become strongly dependent on each other to stay consistent within a path. Once an LLM masters grammar and use of language, it stays consistent within the path. However, you get the same problem, but now repeated on a larger scale.
Think about it as tiling the plane using a building block which you use to create larger building blocks. The larger building block has similar properties but at a larger scale, so errors propagate, but slower.
If you use independent probabilities, then it is easy to make it look as if LLMs are diverging quickly, but they are not doing that in practice. Also, if you have 99% of selecting one token as output, then this is not observed by the LLM predicting next token. It “observes” its previous token as 100%. There is a chance that the LLM produced the wrong token, but in contexts where the path is predictable, it will learn to output tokens consistently enough to produce coherence.
Humans often think about probabilities as inherently uncertain, because we have not evolved the intuition for nuance to understand probabilities at a deeper level. When an AI outputs actions, probabilities might be thought of as an “interpretation” of the action over possible worlds, while the action itself is a certain outcome in the actual world.
The error margin is larger than it appears as estimated from tokens, due to combinatorial complexity. There are many paths that the output can take and still produce an acceptable answer, but an LLM needs to stay consistent depending on which path is chosen. The next level is to combine paths consistently, such that the dependence of one path on another is correctly learned. This means you get a latent space where the LLM learns the representation of the world using paths and paths between paths. With other words, it learns the topology of the world and is able to map this topology onto different spaces, which appear as conversations with humans.
I’m not sure I understand correctly: you are saying that it’s easier for the LLM to stay coherent in the relevant sense than it looks from LeCun’s argument, right?
The error margin LeCun used is for independent probabilities, while in an LLM the paths that the output takes become strongly dependent on each other to stay consistent within a path. Once an LLM masters grammar and use of language, it stays consistent within the path. However, you get the same problem, but now repeated on a larger scale.
Think about it as tiling the plane using a building block which you use to create larger building blocks. The larger building block has similar properties but at a larger scale, so errors propagate, but slower.
If you use independent probabilities, then it is easy to make it look as if LLMs are diverging quickly, but they are not doing that in practice. Also, if you have 99% of selecting one token as output, then this is not observed by the LLM predicting next token. It “observes” its previous token as 100%. There is a chance that the LLM produced the wrong token, but in contexts where the path is predictable, it will learn to output tokens consistently enough to produce coherence.
Humans often think about probabilities as inherently uncertain, because we have not evolved the intuition for nuance to understand probabilities at a deeper level. When an AI outputs actions, probabilities might be thought of as an “interpretation” of the action over possible worlds, while the action itself is a certain outcome in the actual world.