This is fascinating, and is further evidence to me that LLMs contain models of reality. I get frustrated with people who say LLMs “just” predict the next token, or they are simply copying and pasting bits of text from their training data. This argument skips over the fact that in order to accurately predict the next token, it’s necessary to compress the data in the training set down to something which looks a lot like a mostly accurate model of the world. In other words, if you have a large set of data entangled with reality, then the simplest model which predicts that data looks like reality.
This model of reality can be used to infer things which aren’t explicitly in the training data—like distances between places which aren’t mentioned together in the training data.
A Turing machine just predicts (with 100% accuracy) the symbol it will write, its next state, and its next position. And that happens to be enough for many interesting things.
This is fascinating, and is further evidence to me that LLMs contain models of reality.
I get frustrated with people who say LLMs “just” predict the next token, or they are simply copying and pasting bits of text from their training data. This argument skips over the fact that in order to accurately predict the next token, it’s necessary to compress the data in the training set down to something which looks a lot like a mostly accurate model of the world. In other words, if you have a large set of data entangled with reality, then the simplest model which predicts that data looks like reality.
This model of reality can be used to infer things which aren’t explicitly in the training data—like distances between places which aren’t mentioned together in the training data.
A Turing machine just predicts (with 100% accuracy) the symbol it will write, its next state, and its next position. And that happens to be enough for many interesting things.