faul_sname comments on shortplav

faul_sname 10 Jan 2024 7:44 UTC
4 points
Looking at the AlphaZero paper
Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value network12 into a single architecture. The neural network consists of many residual blocks4 of convolutional layers16,17 with batch normalization18 and rectifier nonlinearities19 (see Methods).
So if I’m interpreting that correctly, the NN is used for both position evaluation and also for the search part.