bogus comments on Open thread, Jan. 25 - Jan. 31, 2016

bogus 1 Feb 2016 20:04 UTC
0 points
0

The second stage of the training pipeline aims at improving the policy network by policy gradient reinforcement learning (RL).

Except that they don’t seem to use the resulting network in actual play; the only use is for deriving their state-evaluation network.