I used to think that it would be very difficult for an LLM to build a model of chess because chess it is not about words and sentences. But a discussion led to the realization that the chess model underlying the chess notation is not that different from long distance referents in (programming) languages. Imagine the 2D chess grid not as a physical board but as a doubly nested array with fixed length (fixed length might make it even easier). GPT clearly can do that. And once it has learned that all higher layers can focus on the rules (though without the MCTS of AlphaZero).
I made an illegal move while playing over the board (5+3 blitz) yesterday and lost the game. Maybe my model of chess (even when seeing the current board state) is indeed questionable, but well, it apparently happens to grandmasters in blitz too.
GPT-3.5 can play chess at the 1800 elo level, which is terrifying and impossible without at least a chess world model
I used to think that it would be very difficult for an LLM to build a model of chess because chess it is not about words and sentences. But a discussion led to the realization that the chess model underlying the chess notation is not that different from long distance referents in (programming) languages. Imagine the 2D chess grid not as a physical board but as a doubly nested array with fixed length (fixed length might make it even easier). GPT clearly can do that. And once it has learned that all higher layers can focus on the rules (though without the MCTS of AlphaZero).
Note that it also makes illegal moves from rare board states, which means its model of chess is pretty questionable.
I made an illegal move while playing over the board (5+3 blitz) yesterday and lost the game. Maybe my model of chess (even when seeing the current board state) is indeed questionable, but well, it apparently happens to grandmasters in blitz too.
I would highly recommend playing against it and trying to get it confused and out of distribution, its very difficult at least for me