skeptical_lurker comments on AlphaGo versus Lee Sedol

skeptical_lurker 15 Mar 2016 13:40 UTC
0 points

not at 79 (when you can’t accidentally prune 78 because it’s already on the board

Of course, but I can’t remember which was the other very low-probability move, so perhaps it was one of the later moves in that sequence?

I don’t recall much detail about AG, but I thought the training it did was to improve the policy net? If the policy net was only trained on amateurs, what was it learning from self-play?

I thought the self-play only trained the value net (because they want it to predict human moves, not its own moves), but I might be remembering incorrectly. Pity that the paper is behind a paywall.