This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).
I found this claim suspect because this basically is not a thing that happens in board games. In complex strategy board games like Chess, practical amounts of search on top of a good prior policy and/or eval function (which Stockfish has), almost always outperforms any pure forward pass policy model that doesn’t do explicit search, even when that pure policy model is quite large and extensively trained. With any reasonable settings, it’s very unlikely that the distillation of Stockfish into a pure policy model produces a better player than Stockfish.
I skimmed the paper (https://arxiv.org/pdf/2402.04494), and had trouble finding such a claim, and indeed it seems the original poster of that thread later retracted that claim as due to their own mistake in interpreting the data table of the paper. The post where they acknowledge the mistake is much less prominent than the original post, link here: https://x.com/sytelus/status/1848239379753717874 . The chess transformer remains quite a bit weaker than the Stockfish it tries to predict/imitate.
I found this claim suspect because this basically is not a thing that happens in board games. In complex strategy board games like Chess, practical amounts of search on top of a good prior policy and/or eval function (which Stockfish has), almost always outperforms any pure forward pass policy model that doesn’t do explicit search, even when that pure policy model is quite large and extensively trained. With any reasonable settings, it’s very unlikely that the distillation of Stockfish into a pure policy model produces a better player than Stockfish.
I skimmed the paper (https://arxiv.org/pdf/2402.04494), and had trouble finding such a claim, and indeed it seems the original poster of that thread later retracted that claim as due to their own mistake in interpreting the data table of the paper. The post where they acknowledge the mistake is much less prominent than the original post, link here: https://x.com/sytelus/status/1848239379753717874 . The chess transformer remains quite a bit weaker than the Stockfish it tries to predict/imitate.