One component of AlphaZero is a neural net which takes a board position as input, and outputs a guess about how good the position is and what a good next move would be. It combines this neural net with Monte Carlo Tree Search (MCTS) that plays out different ways the game could go, before choosing the move. The MCTS is used both during self-play to train the neural net, and during competitive test-time. I’m mainly curious about whether the latter is necessary.
So my question is: Once you have the fully-trained AlphaZero system, if you then turn off the MCTS and just choose moves directly with the neural net policy head, is it any good? Is it professional-level, amateur-level, child-level?
(I think this would be a fun little data-point related to discussions of how powerful an AI can be with and without mesa-optimization / search-processes using a generative environmental model.)
The paper includes the ELO for just the NN. I believe it’s professional level but not superhuman, but you should check if you really need to know. However, note that Alphazero’s actual play doesn’t use MCTS at all, it uses a simple tree search which only descends a few ply.
Thanks for your answer! But I’m afraid I’m confused on both counts.
I couldn’t, and still can’t, find “ELO for just the NN” in the paper… :-( I checked the arxiv version and preprint version.
As for “actual play doesn’t use MCTS at all”, well the authors say it does use MCTS… Am I misunderstanding the authors, or are you saying that the “thing the authors call MCTS” is not actually MCTS? (For example, I understand that it’s not actually random.)
You want the original ‘AlphaGo Zero’ paper, not the later ‘AlphaZero’ papers, which merely simplify it and reuse it in other domains; the AGZ paper is more informative than the AZ papers. See Figure 6b, and pg25 for the tree search details:
So the raw NN—a single forward pass and selecting the max—is 3k ELO, about 100 ELO under AlphaGo Fan, which soundly defeated a human professional (Fan Hui). I’m not sure whether −100 ELO is enough to demote it to ‘amateur’ status, but it’s at least clearly not that far from professional in the worst case.
EDIT: for a much more thorough and rigorous discussion of how you can exchange training for runtime tree search, see Jones 2021; this lets you calculate how much you’d have to spend to train a (probably larger) AlphaZero to close that 100 ELO gap, or to try to get up to 4,858 ELO with solely a forward pass and no search.
Either your understanding is correct or mine isn’t: AlphaGo Zero and AlphaZero _do_ do a tree search that the DM papers call “Monte Carlo Tree Search” but that doesn’t involve actual Monte Carlo playouts and that doesn’t match e.g. the description on the Wikipedia page about MCTS.
But it does use MCTS in training. You might say that it uses MCTS to generate a better player to learn from.
Sure. But the final player does not use MCTS, and it’s interesting that it’s not necessary then. (It’s even more interesting that the way they discovered they didn’t need MCTS is by hyperparameter optimization, but that’s a different discussion.)
This is incorrect. It is International Master-level without tree search. Good amateur, but there are >1000 players in the world that are better.
And it is neither MCTS or a “simple tree search”, it uses PUCT, often calculating very deeply in a few lines.
International masters are emphatically not amateurs. Indeed, IMs are at the level where they can offer coaching services to amateur players, and reasonably expect to be paid something on the order of $100 per session. To elaborate on this point:
The total number of FIDE-rated chess players is over 500,000. The number of IMs, meanwhile, totals less than 3,000. IMs are quite literally in the 99th percentile of chess ability, and that’s actually being extremely restrictive with the population—there are many casual players who don’t have FIDE ratings at all, since only people who play in at least one FIDE-rated tournament will be assigned a rating.
I didn’t say anything about chess or shogi because I don’t recall any ablation for A0, I just remember the one in the AG0 paper for Go. The AG0 is definitely at or close to professional level and better than ‘good amateur’. And I would consider a non-distributed PUCT with no rollouts or other refinements to be a ‘simple tree search’: it doesn’t do any rollouts, and the depth is seriously limited by running on only a single machine w/4 TPUs with a few seconds for search: as the AG0 paper puts it, “Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts...we chose to use the simplest possible search algorithm”.