Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can’t accidentally prune 78 because it’s already on the board). But people have been saying that it made a mistake at 79.
I don’t recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
not at 79 (when you can’t accidentally prune 78 because it’s already on the board
Of course, but I can’t remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I don’t recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.
Naively, pruning seems like it would cause a mistake at 77 (allowing the brilliant followup 78), not at 79 (when you can’t accidentally prune 78 because it’s already on the board). But people have been saying that it made a mistake at 79.
I don’t recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?
Of course, but I can’t remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?
I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.