Thanks for your answer! But I’m afraid I’m confused on both counts.
I couldn’t, and still can’t, find “ELO for just the NN” in the paper… :-( I checked the arxiv version and preprint version.
As for “actual play doesn’t use MCTS at all”, well the authors say it does use MCTS… Am I misunderstanding the authors, or are you saying that the “thing the authors call MCTS” is not actually MCTS? (For example, I understand that it’s not actually random.)
You want the original ‘AlphaGo Zero’ paper, not the later ‘AlphaZero’ papers, which merely simplify it and reuse it in other domains; the AGZ paper is more informative than the AZ papers. See Figure 6b, and pg25 for the tree search details:
Figure 6b shows the performance of each program on an Elo scale. The raw neural network, without using any lookahead, achieved an Elo rating of 3,055. AlphaGo Zero achieved a rating of 5,185, compared to 4,858 for AlphaGo Master, 3,739 for AlphaGo Lee and 3,144 for AlphaGo Fan.
So the raw NN—a single forward pass and selecting the max—is 3k ELO, about 100 ELO under AlphaGo Fan, which soundly defeated a human professional (Fan Hui). I’m not sure whether −100 ELO is enough to demote it to ‘amateur’ status, but it’s at least clearly not that far from professional in the worst case.
EDIT: for a much more thorough and rigorous discussion of how you can exchange training for runtime tree search, see Jones 2021; this lets you calculate how much you’d have to spend to train a (probably larger) AlphaZero to close that 100 ELO gap, or to try to get up to 4,858 ELO with solely a forward pass and no search.
Either your understanding is correct or mine isn’t: AlphaGo Zero and AlphaZero _do_ do a tree search that the DM papers call “Monte Carlo Tree Search” but that doesn’t involve actual Monte Carlo playouts and that doesn’t match e.g. the description on the Wikipedia page about MCTS.
Thanks for your answer! But I’m afraid I’m confused on both counts.
I couldn’t, and still can’t, find “ELO for just the NN” in the paper… :-( I checked the arxiv version and preprint version.
As for “actual play doesn’t use MCTS at all”, well the authors say it does use MCTS… Am I misunderstanding the authors, or are you saying that the “thing the authors call MCTS” is not actually MCTS? (For example, I understand that it’s not actually random.)
You want the original ‘AlphaGo Zero’ paper, not the later ‘AlphaZero’ papers, which merely simplify it and reuse it in other domains; the AGZ paper is more informative than the AZ papers. See Figure 6b, and pg25 for the tree search details:
So the raw NN—a single forward pass and selecting the max—is 3k ELO, about 100 ELO under AlphaGo Fan, which soundly defeated a human professional (Fan Hui). I’m not sure whether −100 ELO is enough to demote it to ‘amateur’ status, but it’s at least clearly not that far from professional in the worst case.
EDIT: for a much more thorough and rigorous discussion of how you can exchange training for runtime tree search, see Jones 2021; this lets you calculate how much you’d have to spend to train a (probably larger) AlphaZero to close that 100 ELO gap, or to try to get up to 4,858 ELO with solely a forward pass and no search.
Either your understanding is correct or mine isn’t: AlphaGo Zero and AlphaZero _do_ do a tree search that the DM papers call “Monte Carlo Tree Search” but that doesn’t involve actual Monte Carlo playouts and that doesn’t match e.g. the description on the Wikipedia page about MCTS.