The game notation is pretty close to a board representation already. For most pieces you just go to their last move to see on which square they are standing. I assume that is very readable for a LLM because they are able to keep all tokens in mind simultaneously.
In my games with ChatGPT and GPT-4 (without the magic prompt) they both seemed to lose track of the position after the opening and completely fell apart. Which might be because by then many pieces have moved several times (so there are competing moves indicating a square) and many pieces have vanished from the board altogether.
The game notation is pretty close to a board representation already. For most pieces you just go to their last move to see on which square they are standing. I assume that is very readable for a LLM because they are able to keep all tokens in mind simultaneously.
That raises an interesting question for world-modeling: does providing any hints or partial reveals of the hidden state make it easier or harder for a predictor to develop an internal world-model?
You could argue that it makes it easier, because look, a chunk of the hidden state is revealed right there and is being predicted, so of course that makes the task easier than if you were using some sort of even more implicit action-only representation like ‘move the left-knight 2 spaces forward then 1 right’. But you could also argue that it makes it harder, by providing shortcuts and heuristics which a predictor can latch onto to minimize most of its loss, sabotaging the development of a complex but correct world model. (Like providing a lazy kid with a cheatsheet to a test: yes, they could use it to study, carefully correcting their own errors… or they could just copy the provided answers and then guess on the rest.)
My intuition is that it would depend on optimization details like compute & data: the partial-state model would eventually outperform in terms of both predictive loss and world-model internals, but it would take longer, in some sense, than the blind model which is being forced from the start to try to develop a world-model. (Somewhat like grokking and other late-training ‘emergence’-esque phenomena.)
The linear probe accuracy for the board state actually peaks in the sixth layer (out of eight). To predict the next move it already discards some of the state information. Well, maybe that is unsurprising.
It also doesn’t reach terribly impressive accuracy. 99.2% sounds a lot, but it is per square, which means it might get a square wrong in every second position.
I think more important than how easy it is to extract the information, is how necessary it is to extract the information. You can probably be somewhat fuzzy about board state details and still get great accuracy.
There is a step in move prediction where you go beyond the intuitive move selection and have to calculate to find deeper reasons for and against moves. This feels similar to me to attending to your uncertainty about the placement of particular pieces beyond the immediate necessity. And all these models have not taken this step yet.
I’m actually doing an analysis right now to nail it down that GPTs don’t calculate ahead when trained on move prediction and stay completely in the intuitive move selection regime, but it’s not easy to separate intuitive move selection and calculation in a bulletproof way.
Yes, that’s a very common observation. After all, you still have to try to model the current player’s planning & move based on the board state, and in the final layers, you also have to generate the actual prediction—the full 51k BPE logit array or whatever. That has to happen somewhere, and the final layers are the most logical place to do so. Same as with CNNs doing image classification: the final layer is a bad place to get an embedding from, because by that point, the CNN is changing the activations for the final categorical output.
I think more important than how easy it is to extract the information, is how necessary it is to extract the information. You can probably be somewhat fuzzy about board state details and still get great accuracy.
Yes. This gets back to the core of ‘what is an imitation-learning LLM doing?’ Janus’s Simulators puts the emphasis on ‘it is learning the distribution, and learning to simulate worlds’; but the DRL perspective puts the emphasis on ‘it is learning to act like an agent to maximize its predictive-reward, and learns simulations/worlds only insofar as that is necessary to maximize reward’. It learns a world-model which chucks out everything which is unnecessary for maximizing reward: this is not a faithful model but a value-equivalent model.
If there is some aspect of the latent chess state which doesn’t, ultimately, help win games, then a chess LLM (or MuZero) doesn’t want to learn to model that part of the latent chess state, because it is, by definition, useless. (It may learn to model it, but for other reasons like accident or having over-generalized or because it’s not yet sure that part is useless or as a shortcut etc etc.)
This hasn’t previously been important to the ‘do LLMs learn a world model?’ literature because the emphasis has been responding to naysayers like Bender, who claim that they do not and cannot learn any world model at all, that a ‘superintelligent octopus’ eavesdropping on chess game transcripts would never learn to play chess at all beyond memorization. But since that claim is now generally accepted to be false, the questions move on to ‘since they do learn world models, what sorts, how well, why, and when?’
Which will include the question of how well they can learn to amortize planning. I am quite sure the answer is that they do so to some non-zero degree, and that you are wrong in general about GPTs never planning ahead, based on evidence like Jones’s scaling laws which show no sharp transitions between training and planning and only a smooth exchange rate, and the fact that you can so easily distill results from planning into a GPT (eg. distilling the results of inner-monologue ‘planning’ into a single forward pass). So my expectation is that either your models will be too small to show clear signs or planning or you will just get false nulls from inadequate model interpretability methods—it’s not an easy thing to do, to figure out what a model is thinking and what it is not thinking, after all.
The game notation is pretty close to a board representation already. For most pieces you just go to their last move to see on which square they are standing. I assume that is very readable for a LLM because they are able to keep all tokens in mind simultaneously.
In my games with ChatGPT and GPT-4 (without the magic prompt) they both seemed to lose track of the position after the opening and completely fell apart. Which might be because by then many pieces have moved several times (so there are competing moves indicating a square) and many pieces have vanished from the board altogether.
That raises an interesting question for world-modeling: does providing any hints or partial reveals of the hidden state make it easier or harder for a predictor to develop an internal world-model?
You could argue that it makes it easier, because look, a chunk of the hidden state is revealed right there and is being predicted, so of course that makes the task easier than if you were using some sort of even more implicit action-only representation like ‘move the left-knight 2 spaces forward then 1 right’. But you could also argue that it makes it harder, by providing shortcuts and heuristics which a predictor can latch onto to minimize most of its loss, sabotaging the development of a complex but correct world model. (Like providing a lazy kid with a cheatsheet to a test: yes, they could use it to study, carefully correcting their own errors… or they could just copy the provided answers and then guess on the rest.)
My intuition is that it would depend on optimization details like compute & data: the partial-state model would eventually outperform in terms of both predictive loss and world-model internals, but it would take longer, in some sense, than the blind model which is being forced from the start to try to develop a world-model. (Somewhat like grokking and other late-training ‘emergence’-esque phenomena.)
There was another Chess-GPT investigation into that question recently by Adam Karvonen: https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
The linear probe accuracy for the board state actually peaks in the sixth layer (out of eight). To predict the next move it already discards some of the state information. Well, maybe that is unsurprising.
It also doesn’t reach terribly impressive accuracy. 99.2% sounds a lot, but it is per square, which means it might get a square wrong in every second position.
I think more important than how easy it is to extract the information, is how necessary it is to extract the information. You can probably be somewhat fuzzy about board state details and still get great accuracy.
There is a step in move prediction where you go beyond the intuitive move selection and have to calculate to find deeper reasons for and against moves. This feels similar to me to attending to your uncertainty about the placement of particular pieces beyond the immediate necessity. And all these models have not taken this step yet.
I’m actually doing an analysis right now to nail it down that GPTs don’t calculate ahead when trained on move prediction and stay completely in the intuitive move selection regime, but it’s not easy to separate intuitive move selection and calculation in a bulletproof way.
Yes, that’s a very common observation. After all, you still have to try to model the current player’s planning & move based on the board state, and in the final layers, you also have to generate the actual prediction—the full 51k BPE logit array or whatever. That has to happen somewhere, and the final layers are the most logical place to do so. Same as with CNNs doing image classification: the final layer is a bad place to get an embedding from, because by that point, the CNN is changing the activations for the final categorical output.
Yes. This gets back to the core of ‘what is an imitation-learning LLM doing?’ Janus’s Simulators puts the emphasis on ‘it is learning the distribution, and learning to simulate worlds’; but the DRL perspective puts the emphasis on ‘it is learning to act like an agent to maximize its predictive-reward, and learns simulations/worlds only insofar as that is necessary to maximize reward’. It learns a world-model which chucks out everything which is unnecessary for maximizing reward: this is not a faithful model but a value-equivalent model.
If there is some aspect of the latent chess state which doesn’t, ultimately, help win games, then a chess LLM (or MuZero) doesn’t want to learn to model that part of the latent chess state, because it is, by definition, useless. (It may learn to model it, but for other reasons like accident or having over-generalized or because it’s not yet sure that part is useless or as a shortcut etc etc.)
This hasn’t previously been important to the ‘do LLMs learn a world model?’ literature because the emphasis has been responding to naysayers like Bender, who claim that they do not and cannot learn any world model at all, that a ‘superintelligent octopus’ eavesdropping on chess game transcripts would never learn to play chess at all beyond memorization. But since that claim is now generally accepted to be false, the questions move on to ‘since they do learn world models, what sorts, how well, why, and when?’
Which will include the question of how well they can learn to amortize planning. I am quite sure the answer is that they do so to some non-zero degree, and that you are wrong in general about GPTs never planning ahead, based on evidence like Jones’s scaling laws which show no sharp transitions between training and planning and only a smooth exchange rate, and the fact that you can so easily distill results from planning into a GPT (eg. distilling the results of inner-monologue ‘planning’ into a single forward pass). So my expectation is that either your models will be too small to show clear signs or planning or you will just get false nulls from inadequate model interpretability methods—it’s not an easy thing to do, to figure out what a model is thinking and what it is not thinking, after all.