Wei Dai comments on The Brain as a Universal Learning Machine

Wei Dai 9 Jul 2015 1:08 UTC
2 points
Thanks for the explanations.

this suggests 100 million GPU years or around 100 billion dollars.

Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).

Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion $.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.)

Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
- jacob_cannell 9 Jul 2015 21:51 UTC
  2 points
  Parent
  
  Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence
  
  My earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification.
  
  A more realistic model is this:
  
  B flops = M F * N
  
  Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses.
  
  Today’s CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency—due to memory and bandwidth issues—so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance.
  
  So with 1000 GPUs and today’s tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling.
  
  So it’s only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup.
  
  And finally—parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations—because the AI instances all share the same ANN model or mind and thus learn in parallel.
  
  As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).
  
  Ya, this sounds about right. However, this is predicated on a roughly $100 billion initial investment in 1 million AGI ‘lifetimes’ for research. If that was spread out over just 5 years, that would correspond to a population of about a million AGI’s at the end. In other words, its unlikely that research success would result in only $100 million worth of AGIs.
  
  Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour.
  
  The earlier $1 per GPU-hour is something I remembered from looking at amazon prices, but that was a while ago and is probably completely out of date. The cheapest option is probably to buy gaming video cards and build your own custom data center, and that is where the $1000 per year came from.
  
  Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet.
  
  Yes, in theory if we had the right sim code and AGI structure, I think we could run it today and replace all kinds of human labor. In some sense this has already started—but so far ANNs are automating only some specific simple jobs like coming up with image captions.
  
  Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
  
  Jen said the 10x was ‘ceo-math’, but I still don’t get that figure. 2x is expected from new architecture and process, and then 2x more for fp16 extensions. So 4x is reasonable. More importantly, the bandwidth improvement is claimed to be about 4x or 5x as well.