We collect chess game data from a one-month dump of the Lichess dataset, deliberately distinct from the month used in our own Lichess dataset. we design several model-based tasks including converting PGN to FEN, transferring UCI to FEN, and predicting legal moves, etc, resulting in 1.9M data samples.
Looks like they do a lot of things with FEN and train on a corpus which includes some FEN<->move-based-representation tasks, but they don’t really do anything which directly bears on this, aside from showing that their chess GPTs can turn UCI/sPGNs into FENs with high accuracy. That would seem to imply that it has learned a good model of chess, because otherwise how could it take a move-move-by-move description (like UCI/PGN) and print out an accurate summary of the final board state (like FEN)?
I would’ve preferred to see them do something like train FEN-only vs PGN-only chess GPTs (where the games are identical, just the encoding is different) and demonstrate the former is better, where ‘better’ means something like have a higher Elo or it doesn’t get worse towards the end of the game.
I did train a transformer to predict moves from board positions (not strictly FEN because with FEN positional encodings don’t point to the same squares consistently). Maybe I’ll get around to letting it compete against the different GPTs.
The ChessGPT paper does something like that: https://arxiv.org/abs/2306.09200
I hadn’t seen that recent paper, thanks.
Looks like they do a lot of things with FEN and train on a corpus which includes some FEN<->move-based-representation tasks, but they don’t really do anything which directly bears on this, aside from showing that their chess GPTs can turn UCI/sPGNs into FENs with high accuracy. That would seem to imply that it has learned a good model of chess, because otherwise how could it take a move-move-by-move description (like UCI/PGN) and print out an accurate summary of the final board state (like FEN)?
I would’ve preferred to see them do something like train FEN-only vs PGN-only chess GPTs (where the games are identical, just the encoding is different) and demonstrate the former is better, where ‘better’ means something like have a higher Elo or it doesn’t get worse towards the end of the game.
I did train a transformer to predict moves from board positions (not strictly FEN because with FEN positional encodings don’t point to the same squares consistently). Maybe I’ll get around to letting it compete against the different GPTs.