jacob_cannell comments on [Link] AlphaGo: Mastering the ancient game of Go with Machine Learning

jacob_cannell 30 Jan 2016 0:22 UTC
16 points
This is a big deal, and it is another sign that AGI is near.

Intelligence boils down to inference. Go is an interesting case because good play for both humans and bots like AlphaGo requires two specialized types of inference operating over very different timescales:
- rapid combinatoric inference over move sequences during a game(planning). AlphaGo uses MCT search for this, whereas the human brain uses a complex network of modules involving the basal ganglia, hippocampus, and PFC.
- slow deep inference over a huge amount of experience to develop strong pattern recognition and intuitions (deep learning). AlphaGo uses deep supervised and reinforcement learning via SGD over a CNN for this. The human brain uses the cortex.
Machines have been strong in planning/search style inference for a while. It is only recently that the slower learning component (2nd order inference over circuit/program structure) is starting to approach and surpass human level.

Critics like to point out that DL requires tons of data, but so does the human brain. A more accurate comparison requires quantifying the dataset human pro go players train on.

A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.

So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .

AlphaGo was trained on the KGS dataset: 160,00 games and 29 million positions. So it did not train on significantly more data than a human pro. The data quantities are actually very similar.

Furthermore, the human’s dataset is perhaps of better quality for a pro, as they will be familiar with mainly pro level games, whereas the AlphaGo dataset is mostly amateur level.

The main difference is speed. The human brain’s ‘clockrate’ or equivalent is about 100 hz, whereas AlphaGo’s various CNNs can run at roughly 1000hz during training on a single machine, and perhaps 10,000 hz equivalent distributed across hundreds of machines. 40,000 hours—a lifetime of experience—can be compressed 100x or more into just a couple of weeks for a machine. This is the key lesson here.

The classification CNN trained on KGS was run for 340 million steps, which is about 10 iterations per unique position in the database.

The ANNs that AlphaGo uses are much much smaller than a human brain, but the brain has to do a huge number of other tasks, and also has to solve complex vision and motor problems just to play the game. AlphaGO’s ANNs get to focus purely on Go.

A few hundred TitanX’s can muster up perhaps a petaflop of compute. The high end estimate of the brain is 10 petaflops (100 trillion synapses 100 hz max firing rate). The more realistic estimate is 100 teraflops (100 trillion synapes 1 hz avg firing rate), and the lower end is ¹⁄₁₀ that or less.

So why is this a big deal? Because it suggests that training a DL AI to master more economically key tasks, such as becoming an expert level programmer, could be much closer than people think.

The techniques used here are nowhere near their optimal form yet in terms of efficiency. When Deep Blue beat Kasparov in 1996, it required a specialized supercomputer and a huge team. 10 years later chess bots written by individual programmers running on modest PC’s soared past Deep Blue—thanks to more efficient algorithms and implementations.
What links here?
- Kaj_Sotala's comment on AIFoom Debate—conclusion? by Bound_up (5 Mar 2016 8:41 UTC; 7 points)
- Kaj_Sotala 31 Jan 2016 11:20 UTC
  5 points
  Parent
  
  A 30 year old asian pro will have perhaps 40,000 hours of playing experience (20 years 50 40 hrs/week). The average game duration is perhaps an hour and consists of 200 moves. In addition, pros (and even fans) study published games. Reading a game takes less time, perhaps as little as 5 minutes or so.
  
  So we can estimate very roughly that a top pro will have absorbed between 100,000 games to 1 million games, and between 20 to 200 million individual positions (around 200 moves per game) .
  
  I asked a pro player I know whether these numbers sounded reasonable. He replied:
  
  At least the order of magnitude should be more or less right. Hours of playing weekly is probably somewhat lower on average (say 20-30 hours), and I’d also use 10-15 minutes to read a game instead of five. Just 300 seconds to place 200 stones sounds pretty tough. Still, I’d imagine that a 30-year-old professional has seen at least 50 000 games, and possibly many more.
- bogus 30 Jan 2016 1:32 UTC
  4 points
  Parent
  
  Critics like to point out that DL requires tons of data, but so does the human brain.
  
  Both deep networks and the human brain require lots of data, but the kind of data they require is not the same. Humans engage mostly in semi-supervised learning, where supervised data comprises a small fraction of the total. They also manage feats of “one-shot learning” (making critically-important generalizations from single datapoints) that are simply not feasible for neural networks or indeed other ‘machine learning’ methods.
  
  A few hundred TitanX’s can muster up perhaps a petaflop of compute.
  
  Could you elaborate? I think this number is too high by roughly one order of magnitude.
  
  The high end estimate of the brain is 10 petaflops (100 trillion synapses * 100 hz max firing rate).
  
  Estimating the computational capability of the human brain is very difficult. Among other things, we don’t know what the neuroglia cells may be up to, and these are just as numerous as neurons.
  - jacob_cannell 30 Jan 2016 18:18 UTC
    6 points
    Parent
    
    Both deep networks and the human brain require lots of data, but the kind of data they require is not the same. Humans engage mostly in semi-supervised learning, where supervised data comprises a small fraction of the total.
    
    This is probably a misconception for several reasons. Firstly, given that we don’t fully understand the learning mechanisms in the brain yet, it’s unlikely that it’s mostly one thing. Secondly, we have some pretty good evidence for reinforcement learning in the cortex, hippocampus, and basal ganglia. We have evidence for internally supervised learning in the cerebellum, and unsupervised learning in the cortex.
    
    The point being: these labels aren’t all that useful. Efficient learning is multi-objective and doesn’t cleanly divide into these narrow categories.
    
    The best current guess for questions like this is almost always to guess that the brain’s solution is highly efficient, given it’s constraints.
    
    In the situation where a go player experiences/watches a game between two other players far above one’s own current skill, the optimal learning update is probably going to be a SL style update. Even if you can’t understand the reasons behind the moves yet, it’s best to compress them into the cortex for later. If you can do a local search to understand why the move is good, then that is even better and it becomes more like RL, but again, these hard divisions are arbitrary and limiting.
    
    A few hundred TitanX’s can muster up perhaps a petaflop of compute.
    
    Could you elaborate? I think this number is too high by roughly one order of magnitude.
    
    The GTX TitanX has a peak perf of 6.1 terraflops, so you’d need only a few hundred to get a petaflop supercomputer (more specifically, around 175).
    
    The high end estimate of the brain is 10 petaflops (100 trillion synapses * 100 hz max firing rate).
    
    Estimating the computational capability of the human brain is very difficult. Among other things, we don’t know what the neuroglia cells may be up to, and these are just as numerous as neurons.
    
    It’s just a circuit, and it obeys the same physical laws. We have this urge to mystify it for various reasons. Neuroglia can not possibly contribute more to the total compute power than the neurons, based on simple physics/energy arguments. It’s another stupid red herring like quantum woo.
    
    These estimates are only validated when you can use them to make predictions. And if you have the right estimates (brain equivalent to 100 terraflops ish, give or take an order of magnitude), you can roughly predict the outcome of many comparisons between brain circuits vs equivalent ANN circuits (more accurately than using the wrong estimates).
    - bogus 30 Jan 2016 19:31 UTC
      0 points
      Parent
      
      This is probably a misconception for several reasons. Firstly, given that we don’t fully understand the learning mechanisms in the brain yet, it’s unlikely that it’s mostly one thing …
      
      We don’t understand the learning mechanisms yet, but we’re quite familiar with the data they use as input. “Internally” supervised learning is just another term for semi-supervised learning anyway. Semi-supervised learning is plenty flexible enough to encompass the “multi-objective” features of what occurs in the brain.
      
      The GTX TitanX has a peak perf of 6.1 terraflops, so you’d need only a few hundred to get a petaflop supercomputer (more specifically, around 175).
      
      Raw and “peak performance” FLOPS numbers should be taken with a grain of salt. Anyway, given that a TitanX apparently draws as much as 240W of power at full load, your “petaflop-scale supercomputer” will cost you a few hundred-thousand dollars and draw 42kW to do what the brain does within 20W or so. Not a very sensible use for that amount of computing power—except for the odd publicity stunt, I suppose. Like playing Go.
      
      It’s just a circuit, and it obeys the same physical laws.
      
      Of course. Neuroglia are not magic or “woo”. They’re physical things, much like silicon chips and neurons.
      - jacob_cannell 31 Jan 2016 0:06 UTC
        1 point
        Parent
        
        Raw and “peak performance” FLOPS numbers should be taken with a grain of salt.
        
        Yeah, but in this case the best convolution and gemm codes can reach like 98% efficiency for the simple standard algorithms and dense input—which is what most ANNs use for about everything.
        
        given that a TitanX apparently draws as much as 240W of power at full load, your “petaflop-scale supercomputer” will cost you a few hundred-thousand dollars and draw 42kW to do what the brain does within 20W or so
        
        Well, in this case of Go and for an increasing number of domains, it can do far more than any brain—learns far faster. Also, the current implementations are very very far from optimal form. There is at least another 100x to 1000x easy perf improvement in the years ahead. So what 100 gpus can do now will be accomplished by a single GPU in just a year or two.
        
        It’s just a circuit, and it obeys the same physical laws.
        
        Of course. Neuroglia are not magic or “woo”. They’re physical things, much like silicon chips and neurons.
        
        Right, and they use a small fraction of the energy budget, and thus can’t contribute much to the computational power.
        bogus 31 Jan 2016 0:11 UTC
        2 points
        Parent
        
        Well, in this case of Go and for an increasing number of domains, it can do far more than any brain—learns far faster.
        
        This might actually be the most interesting thing about AlphaGo. Domain experts who have looked at its games have marveled most at how truly “book-smart” it is. Even though it has not shown a lot of creativity or surprising moves (indeed, it was comparatively weak at the start of Game 1), it has fully internalized its training and can always come up with the “standard” play.
        
        Right, and they use a small fraction of the energy budget, and thus can’t contribute much to the computational power.
        
        Not necessarily—there might be a speed vs. energy-per-op tradeoff, where neurons specialize in quick but energy-intensive computation, while neuroglia just chug along in the background. We definitely see such a tradeoff in silicon devices.
        Kaj_Sotala 31 Jan 2016 11:25 UTC
        1 point
        Parent
        
        Domain experts who have looked at its games have marveled most at how truly “book-smart” it is. Even though it has not shown a lot of creativity or surprising moves (indeed, it was comparatively weak at the start of Game 1), it has fully internalized its training and can always come up with the “standard” play.
        
        Do you have links to such analyses? I’d be interested in reading them.
        
        EDIT: Ah, I guess you were referring to this: https://www.reddit.com/r/MachineLearning/comments/43fl90/synopsis_of_top_go_professionals_analysis_of/