Zvi comments on Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Zvi 6 Dec 2017 21:34 UTC
15 points
The paper gives AZ’s elo as strangely low (~3000) versus the top Stockfish elos of around 3400. And there’s this quote about the way they configured Stockfish that I was pointed to elsewhere:
It is a nice step different direction, perhaps the start if the revolution but Alpha Zero is not yet better than Stockfish and if you keep up with me I will explain why. Most of the people are very excited now and wishing for sensation so they don’t really read the paper or think about what it says which leads to uninformed opinions.
The testing conditions were terrible. 1min/move is not really suitable time for any engine testing but you could tolerate that. What is intolerable though is the hashtable size—with 64 cores Stockfish was given, you would expect around 32GB or more otherwise it fills up very quickly leading to markant reduce in strenght − 1GB was given and that far from ideal value! Also SF was now given any endgame tablebases which is current norm for any computer chess engine.
The computational power behind each entity was very different—while SF was given 64 CPU threads (really a lot I’ve got to say), Alpha Zero was given 4 TPUs. TPU is a specialized chip for machine learning and neural network calculations. It’s estimated power compared to classical CPU is as follows − 1TPU ~ 30xE5-2699v3 (18 cores machine) → Aplha Zero had at it’s back power of ~2000 Haswell cores. That is nowhere near fair match. And yet, eventhough the result was dominant, it was not where it would be if SF faced itself 2000cores vs 64 cores, It that case the win percentage would be much more heavily in favor of the more powerful hardware.
From those observations we can make an conclusion—Alpha Zero is not so close in strenght to SF as Google would like us to believe. Incorrect match settings suggest either lack of knowledge about classical brute-force calculating engines and how they are properly used, or intention to create conditions where SF would be defeted.
With all that said, It is still an amazing achievement and definitively fresh air in computer chess, most welcome these days. But for the new computer chess champion we will have to wait a little bit longer.
So while some of those games are impossibly cool, it’s likely that they are not as far along as they appear, although this is still obviously a major achievement.
- Zhaslan Dochshanov 8 Dec 2017 8:48 UTC
  3 points
  Parent
  On the power consumption level, TPUs and CPUs are very similar. TPUs are of no use besides matrix multiplications, and Stockfish uses many other types of computation. It’s just not fair to say that Stockfish was faced against 2000 cores.