I don’t know much about Leela Zero and Katago but I do know that Leela Chess Zero (lc0) without search (pure policy) is near superhuman levels. I’ll see if I can dig up more precise specifics.
The LC0 pure policy is most certainly not superhuman. To test this, I just had it (network 791556, i.e. standard network of the current LC0 release) play a game against a weak computer opponent (Shredder Chess Online). SCO plays maybe at the level of a strong expert/weak candidate master at rapid chess time controls (but it plays a lot faster, thereby making generation of a test game more convenient than trying to beat policy-only lc0 myself, which I think should be doable). Result was draw, after lc0 first completely outplayed SCO positionally, and then blundered tactically in a completely won position, with a strong-looking move that had a simple tactical refutation. It then probably still had a very slight advantage, but opted to take a draw by repetition.
I think policy-only lc0 plays worse relative to strong humans than Katago/LeelaZero in Go. I would attribute this to chess being easier to lose by tactical blunder than Go.
791556 is nowhere near the strongest network available. It’s packaged with lc0 as a nice small net. The BT2 net currently playing at tcec-chess.com is several hundreds of Elo stronger than T79 and likely close to superhuman level, depending on the time control. It’s not the very latest and greatest, but it is publicly available for download and should work with the 0.30.0-rc1 pre-release version of lc0 that supports the newer transformer architecture if you want to try it yourself. If you only want completely “official” nets, at least grab one of the latest networks from the main T80 run.
I’m not confident that BT2 is strictly superhuman using pure policy but I’m pretty sure it’s at least close. LazyBot is a Lichess bot that plays pure policy but uses a T80 net that is likely at least 100 Elo weaker than BT2.
Thanks for the information. I’ll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don’t think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human opponents have more than 15 minutes for the whole game.
Self-play elo vastly exaggerates playing strength differences between different networks, so I would not expect a BT2 vs T80 difference of 100 elo points to translate to close to 100 elo playing strength difference against humans.
Yes, clearly the less time the human has, the better Leela will do relatively. One thing to note though is that Lichess Elo isn’t completely comparable across different time controls. If you look at the player leaderboard, you can see that the top scores for bullet are ~600 greater than for classical, so scores need to be interpreted in context.
Self-Elo inflation is a fair point to bring up and I don’t have information on how well it translates.
I don’t know much about Leela Zero and Katago but I do know that Leela Chess Zero (lc0) without search (pure policy) is near superhuman levels. I’ll see if I can dig up more precise specifics.
The LC0 pure policy is most certainly not superhuman. To test this, I just had it (network 791556, i.e. standard network of the current LC0 release) play a game against a weak computer opponent (Shredder Chess Online). SCO plays maybe at the level of a strong expert/weak candidate master at rapid chess time controls (but it plays a lot faster, thereby making generation of a test game more convenient than trying to beat policy-only lc0 myself, which I think should be doable). Result was draw, after lc0 first completely outplayed SCO positionally, and then blundered tactically in a completely won position, with a strong-looking move that had a simple tactical refutation. It then probably still had a very slight advantage, but opted to take a draw by repetition.
I think policy-only lc0 plays worse relative to strong humans than Katago/LeelaZero in Go. I would attribute this to chess being easier to lose by tactical blunder than Go.
791556 is nowhere near the strongest network available. It’s packaged with lc0 as a nice small net. The BT2 net currently playing at tcec-chess.com is several hundreds of Elo stronger than T79 and likely close to superhuman level, depending on the time control. It’s not the very latest and greatest, but it is publicly available for download and should work with the 0.30.0-rc1 pre-release version of lc0 that supports the newer transformer architecture if you want to try it yourself. If you only want completely “official” nets, at least grab one of the latest networks from the main T80 run.
I’m not confident that BT2 is strictly superhuman using pure policy but I’m pretty sure it’s at least close. LazyBot is a Lichess bot that plays pure policy but uses a T80 net that is likely at least 100 Elo weaker than BT2.
Thanks for the information. I’ll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don’t think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human opponents have more than 15 minutes for the whole game.
Self-play elo vastly exaggerates playing strength differences between different networks, so I would not expect a BT2 vs T80 difference of 100 elo points to translate to close to 100 elo playing strength difference against humans.
Yes, clearly the less time the human has, the better Leela will do relatively. One thing to note though is that Lichess Elo isn’t completely comparable across different time controls. If you look at the player leaderboard, you can see that the top scores for bullet are ~600 greater than for classical, so scores need to be interpreted in context.
Self-Elo inflation is a fair point to bring up and I don’t have information on how well it translates.