RaelwayScot comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

RaelwayScot 28 Jan 2016 20:39 UTC
0 points
“Nonexistent problems” was meant as a hyperbole to say that they weren’t solved in interesting ways and are extremely simple in this setting because the states and rewards are noise-free. I am not sure what you mean by the second question. They just apply gradient descent on the entire history of moves of the current game such that expected reward is maximized.
- Vaniver 28 Jan 2016 23:17 UTC
  2 points
  Parent
  It seems to me that the problem of value assignment to boards—”What’s the edge for W or B if the game state looks like this?” is basically a solution to that problem, since it gives you the counterfactual information you need (how much would placing a stone here improve my edge?) to answer those questions.
  
  I agree that it’s a much simpler problem here than it is in a more complicated world, but I don’t think it’s trivial.