>Fourth, planning is not necessary at all for the NN to compute results just as strong as tree search would: just like regular AlphaZero, the policy network on its own, with no rollouts or trees involved of any sort, is very strong, and they show that it increases greatly in strength over training. We also have the scaling law work of Andy Jones, verifying the intuition that anything tree search does can be efficiently distilled into a non-tree-search model trained for longer. (I would also point out the steeply diminishing returns to both depth & number of iterations: AlphaZero or Master, IIRC, used only a few TPUs because the tree-search was a simple one which only descended a few plies; you can also see in the papers like the MuZero appendix referenced that most of the play strength comes from just a few iterations, and they don’t even evaluate at more than 800, IIRC. It seems like what tree search does qualitatively is correct the occasional blind spot where the NN thinks forward a few moves for its best move and goes ‘oh shit! That’s actually a bad idea!’. It’s not doing anything super-impressive or subtle. It’s just a modest local policy iteration update, if you will. But the NN is what does almost all of the work.)
I mean, you’re just voicing your own dubious interpretations here; the muzero paper does not, in fact, evaluate its model on Chess or Go without tree search, so we do not know how strong it would be. AlphaGo Zero did so, but the model was not very strong without tree search. The scaling work of Andy Jones doesn’t seem to apply to this scenario, unless I’m missing something.
If I’m reading it right, it says that removing tree search from muZero trained on 9x9 Go reduces its performance to below that of a bot which plays at “strong amateur level”.
I think you are just wrong that muZero can perform well on games like Go without tree search. Whether you want to call it “MCTS” is a semantic debate (I never called it that, but merely said muZero uses a variant of MCTS).
>Fourth, planning is not necessary at all for the NN to compute results just as strong as tree search would: just like regular AlphaZero, the policy network on its own, with no rollouts or trees involved of any sort, is very strong, and they show that it increases greatly in strength over training. We also have the scaling law work of Andy Jones, verifying the intuition that anything tree search does can be efficiently distilled into a non-tree-search model trained for longer. (I would also point out the steeply diminishing returns to both depth & number of iterations: AlphaZero or Master, IIRC, used only a few TPUs because the tree-search was a simple one which only descended a few plies; you can also see in the papers like the MuZero appendix referenced that most of the play strength comes from just a few iterations, and they don’t even evaluate at more than 800, IIRC. It seems like what tree search does qualitatively is correct the occasional blind spot where the NN thinks forward a few moves for its best move and goes ‘oh shit! That’s actually a bad idea!’. It’s not doing anything super-impressive or subtle. It’s just a modest local policy iteration update, if you will. But the NN is what does almost all of the work.)
I mean, you’re just voicing your own dubious interpretations here; the muzero paper does not, in fact, evaluate its model on Chess or Go without tree search, so we do not know how strong it would be. AlphaGo Zero did so, but the model was not very strong without tree search. The scaling work of Andy Jones doesn’t seem to apply to this scenario, unless I’m missing something.
Elsewhere you linked to this:
https://arxiv.org/abs/2011.04021#deepmind
If I’m reading it right, it says that removing tree search from muZero trained on 9x9 Go reduces its performance to below that of a bot which plays at “strong amateur level”.
I think you are just wrong that muZero can perform well on games like Go without tree search. Whether you want to call it “MCTS” is a semantic debate (I never called it that, but merely said muZero uses a variant of MCTS).