Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value network12 into a single architecture. The neural network consists of many residual blocks4 of convolutional layers16,17 with batch normalization18 and rectifier nonlinearities19 (see Methods).
So if I’m interpreting that correctly, the NN is used for both position evaluation and also for the search part.
Leaving aside the issue of to what extent the NN itself is already doing something approximately isomorphic to search or how easy it would be to swap in MuZero instead, I think that the important thing is to measure the benefit of search in particular problems (like Jones does by sweeping over search budgets vs training budgets for Hex etc) rather than how hard the exact algorithm of search itself is.
I mean, MCTS is a simple generic algorithm; you can just treat learning it in a ‘neural’ way as a fixed cost—there’s not much in the way of scaling laws to measure about the MCTS implementation itself. MCTS is MCTS. You can plug in chess as easily as Go or Hex.
It seems much more interesting to know about how expensive ‘real’ problems like Hex or Go are, how well NNs learn, how to trade off architectures or allocate compute between train & runtime...
I’ll take a look, but afaik AlphaZero only uses an NN for position evaluation in MCTS, and not for the search part itself?
Looking at the AlphaZero paper
So if I’m interpreting that correctly, the NN is used for both position evaluation and also for the search part.
The implicit claim is that the policy might be doing internal search?
Leaving aside the issue of to what extent the NN itself is already doing something approximately isomorphic to search or how easy it would be to swap in MuZero instead, I think that the important thing is to measure the benefit of search in particular problems (like Jones does by sweeping over search budgets vs training budgets for Hex etc) rather than how hard the exact algorithm of search itself is.
I mean, MCTS is a simple generic algorithm; you can just treat learning it in a ‘neural’ way as a fixed cost—there’s not much in the way of scaling laws to measure about the MCTS implementation itself. MCTS is MCTS. You can plug in chess as easily as Go or Hex.
It seems much more interesting to know about how expensive ‘real’ problems like Hex or Go are, how well NNs learn, how to trade off architectures or allocate compute between train & runtime...