not at 79 (when you can’t accidentally prune 78 because it’s already on the board
Of course, but I can’t remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I don’t recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.
Of course, but I can’t remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.