ESRogs comments on Review of A Map that Reflects the Territory

ESRogs 13 Sep 2021 21:33 UTC
6 points
Also note that the AlphaZero algorithm is an example of IDA:
- The amplification step is when the policy / value neural net is used to play out a number of steps in the game tree, resulting in a better guess at what the best move is than just using the output of the net directly.
- The distillation step is when the policy / value net is trained to match the output of the game tree exploration process.