Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization.
This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier? How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x? (If it isn’t clear, I mean that if you only have hardware that’s equivalent to a single human brain, it might take a few years to train it to exhibit general intelligence, which seems too slow for research that’s based on trying various designs to see which one works.)
What kind of leverage can we exert on a short timescale?
Seems like a very good question that has been largely neglected. I know your ideas for training/testing neuromorphic AGI in VR environments. What other ideas do people have? Or have seen? I wonder what Shane Legg’s plan is, given that he is worried about existential risk from AI, and also personally (as co-founder of DeepMind) racing to build neuromorphic AGI.
This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier?
ANN based AGI will not need to reproduce brain circuits exactly. There are general tradeoffs between serial depth and circuit size. The brain is much more latency/speed constrained so it uses larger, shallower circuits whereas we can leverage much higher clock speeds to favour deeper smaller circuits. You see the same tradeoffs in circuit design, and also in algorithms where parallel variants always use more ops than the minimal fully serial variant.
Also, independent of those considerations, biological circuits and synapses are redundant, noisy, and low precision.
If you look at raw circuit level ops/second, the brain’s throughput is not that much. A deeper investigation of the actual theoretical minimum computation required to match the human brain would be a subject for a whole post (and one I may not want to write up just yet). With highly efficient future tech, I’d estimate that it would take far less than 10^15 32-bit ops/s (1000 gpus): probably around or less than 10^13 32 bit ops/s. So we may already be entering into a hardware overhang situation.
How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x?
One way to estimate that is to compare to the number of full train/test iterations required to reach high performance in particular important sub-problems such as vision. The current successful networks all descend from designs invented in the 80′s or earlier. Most of the early iterations were on small networks, and I expect the same to continue to be true for whole AGI systems.
Let’s say there are around 100 researchers who worked full time on CNNs for 40 years straight (4000 researcher years), and each tested 10 designs per year—so 40,000 iterations to go from perceptrons to CNNs. A more accurate model should consider the distribution over design iterations times and model sizes. Major new risky techniques are usually tested first on small problems and models and then scaled up.
So anyway, let’s multiply by 20 roughly and say it takes a million AGI ‘lifetimes’ or full test iterations, where each lifetime is 10 years, and it requires 10 GPU years per AGI year, this suggests 100 million GPU years or around 100 billion dollars.
Another more roundabout estimation—it seems that whenever researchers have the technical capability to create ANNs of size N, it doesn’t take long in years to explore and discover what can be built with systems of that size. Software seems to catch up fast. Much of this effect could also be industry scaling up investment, but we can expect that to continue to accelerate.
What other ideas do people have? Or have seen? I wonder what Shane Legg’s plan is, given that he is worried about existential risk from AI, and also personally (as co-founder of DeepMind) racing to build neuromorphic AGI.
I’m not sure. He hasn’t blogged in years. I found this which quotes Legg as saying:
‘Eventually, I think human extinction will probably occur, and technology will likely play a part in this,’ DeepMind’s Shane Legg said in a recent interview.
So presumably he is thinking about it, although that quote suggests he perhaps thinks extinction is inevitable. The most recent interview I can find is this which doesn’t seem much related.
this suggests 100 million GPU years or around 100 billion dollars.
Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).
Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion $.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.)
Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence
My earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification.
A more realistic model is this:
B flops = M F * N
Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses.
Today’s CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency—due to memory and bandwidth issues—so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance.
So with 1000 GPUs and today’s tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling.
So it’s only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup.
And finally—parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations—because the AI instances all share the same ANN model or mind and thus learn in parallel.
As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).
Ya, this sounds about right. However, this is predicated on a roughly $100 billion initial investment in 1 million AGI ‘lifetimes’ for research. If that was spread out over just 5 years, that would correspond to a population of about a million AGI’s at the end. In other words, its unlikely that research success would result in only $100 million worth of AGIs.
Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour.
The earlier $1 per GPU-hour is something I remembered from looking at amazon prices, but that was a while ago and is probably completely out of date. The cheapest option is probably to buy gaming video cards and build your own custom data center, and that is where the $1000 per year came from.
Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet.
Yes, in theory if we had the right sim code and AGI structure, I think we could run it today and replace all kinds of human labor. In some sense this has already started—but so far ANNs are automating only some specific simple jobs like coming up with image captions.
Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
Jen said the 10x was ‘ceo-math’, but I still don’t get that figure. 2x is expected from new architecture and process, and then 2x more for fp16 extensions. So 4x is reasonable. More importantly, the bandwidth improvement is claimed to be about 4x or 5x as well.
This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier? How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x? (If it isn’t clear, I mean that if you only have hardware that’s equivalent to a single human brain, it might take a few years to train it to exhibit general intelligence, which seems too slow for research that’s based on trying various designs to see which one works.)
Seems like a very good question that has been largely neglected. I know your ideas for training/testing neuromorphic AGI in VR environments. What other ideas do people have? Or have seen? I wonder what Shane Legg’s plan is, given that he is worried about existential risk from AI, and also personally (as co-founder of DeepMind) racing to build neuromorphic AGI.
ANN based AGI will not need to reproduce brain circuits exactly. There are general tradeoffs between serial depth and circuit size. The brain is much more latency/speed constrained so it uses larger, shallower circuits whereas we can leverage much higher clock speeds to favour deeper smaller circuits. You see the same tradeoffs in circuit design, and also in algorithms where parallel variants always use more ops than the minimal fully serial variant.
Also, independent of those considerations, biological circuits and synapses are redundant, noisy, and low precision.
If you look at raw circuit level ops/second, the brain’s throughput is not that much. A deeper investigation of the actual theoretical minimum computation required to match the human brain would be a subject for a whole post (and one I may not want to write up just yet). With highly efficient future tech, I’d estimate that it would take far less than 10^15 32-bit ops/s (1000 gpus): probably around or less than 10^13 32 bit ops/s. So we may already be entering into a hardware overhang situation.
One way to estimate that is to compare to the number of full train/test iterations required to reach high performance in particular important sub-problems such as vision. The current successful networks all descend from designs invented in the 80′s or earlier. Most of the early iterations were on small networks, and I expect the same to continue to be true for whole AGI systems.
Let’s say there are around 100 researchers who worked full time on CNNs for 40 years straight (4000 researcher years), and each tested 10 designs per year—so 40,000 iterations to go from perceptrons to CNNs. A more accurate model should consider the distribution over design iterations times and model sizes. Major new risky techniques are usually tested first on small problems and models and then scaled up.
So anyway, let’s multiply by 20 roughly and say it takes a million AGI ‘lifetimes’ or full test iterations, where each lifetime is 10 years, and it requires 10 GPU years per AGI year, this suggests 100 million GPU years or around 100 billion dollars.
Another more roundabout estimation—it seems that whenever researchers have the technical capability to create ANNs of size N, it doesn’t take long in years to explore and discover what can be built with systems of that size. Software seems to catch up fast. Much of this effect could also be industry scaling up investment, but we can expect that to continue to accelerate.
I’m not sure. He hasn’t blogged in years. I found this which quotes Legg as saying:
So presumably he is thinking about it, although that quote suggests he perhaps thinks extinction is inevitable. The most recent interview I can find is this which doesn’t seem much related.
Thanks for the explanations.
Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I’m understanding you correctly).
Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion $.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there’s not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.)
Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
My earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification.
A more realistic model is this:
B flops = M F * N
Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses.
Today’s CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency—due to memory and bandwidth issues—so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance.
So with 1000 GPUs and today’s tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling.
So it’s only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup.
And finally—parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations—because the AI instances all share the same ANN model or mind and thus learn in parallel.
Ya, this sounds about right. However, this is predicated on a roughly $100 billion initial investment in 1 million AGI ‘lifetimes’ for research. If that was spread out over just 5 years, that would correspond to a population of about a million AGI’s at the end. In other words, its unlikely that research success would result in only $100 million worth of AGIs.
The earlier $1 per GPU-hour is something I remembered from looking at amazon prices, but that was a while ago and is probably completely out of date. The cheapest option is probably to buy gaming video cards and build your own custom data center, and that is where the $1000 per year came from.
Yes, in theory if we had the right sim code and AGI structure, I think we could run it today and replace all kinds of human labor. In some sense this has already started—but so far ANNs are automating only some specific simple jobs like coming up with image captions.
Jen said the 10x was ‘ceo-math’, but I still don’t get that figure. 2x is expected from new architecture and process, and then 2x more for fp16 extensions. So 4x is reasonable. More importantly, the bandwidth improvement is claimed to be about 4x or 5x as well.