gwern comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

gwern 30 Jan 2016 1:05 UTC
2 points
An East Asian Go pro will often have been an insei and been studying Go full-time at a school, and a dedicated amateur before that, so you can imagine how many hours a day they will be studying… (The intensiveness is part of why they dominate Go to the degree they do and North American & Europeans are so much weaker: start a lot of kids, start them young, school them 10 hours a day for years studying games and playing against each other and pros, and keep relentlessly filtering to winnow out anyone who is not brilliant.)

I would say 100k is an overestimate since they will tend to be more closely studying the games and commentaries and also working out life-and-death problems, memorizing the standard openings, and whatnot, but they are definitely reading through and studying tens of thousands of games—similar to how one of the reasons chess players are so much better these days than even just decades ago is that computers have given access to enormous databases of games which can be studied with the help of chess AIs (Carlsen has benefited a lot from this, I understand). Also, while I’m nitpicking, AlphaGo trained on both the KGS and then self-play; I don’t know how many games the self-play amounted to, but the appendix broke down the wallclock times by phase, and of the 4 weeks of wallclock time, IIRC most of it was spent on the self-play finetuning the value function.

But if AlphaGo is learning from games ‘only’ more efficiently than 99%+ of the humans who play Go (Fan Hui was ranked in the 600s, there’s maybe 1000-2000 people who earn a living as Go professionals, selected from the hundreds of thousands/millions of people who play), that doesn’t strike me as much of a slur.
- jacob_cannell 30 Jan 2016 18:34 UTC
  2 points
  Parent
  For the SL phase, they trained 340 million updates with a batch size of 16, so 5.4 billion position-updates. However the database had only 29 million unique positions. That’s about 200 gradient iterations per unique position.
  
  The self-play RL phase for AlphaGo consisted of 10,000 minibatches of 128 games each, so about 1 million games total. They only trained that part for a day.
  
  They spent more time training the value network: 50 million minibatches of 32 board positions, so about 1.6 billion positions. That’s still much smaller than the SL training phase.