I disagree with your analysis of “are we that ignorant?”.
For things like nuclear war or financial meltdown, we’ve got lots of relevant data, and not too much reason to expect new risks. For advanced nanotechnology, I think we are ignorant enough that a 10% chance sounds right (I’m guessing it will take something like $1 billion in focused funding).
With AGI, ML researchers can be influenced to change their forecast by 75 years by subtle changes in how the question is worded. That suggests unusual uncertainty.
We can see from Moore’s law and from ML progress that we’re on track for something at least as unusual as the industrial revolution.
The stock and bond markets do provide some evidence of predictability, but I’m unsure how good they are at evaluating events that happen much less than once per century.
I’m kind of surprised that the post doesn’t mention the other, larger discontinuities that they’ve found: nuclear weapons, high-temperature superconduction, and building height.
Plus, it has been argued that the next AI winter is well on its way, i.e. we actually start to see a decline, not a further increase, of interest in AI.
Metaculus has the closest thing to a prediction market on this topic that I’m aware of, which is worth looking at.
Unfortunately, interpreting expert opinion is tricky. On the one hand, in some surveys machine learning researchers put non-negligible probability on “human-level intelligence” (whatever that means) in 10 years. On the other hand, my impression from interacting with the community is that the predominant opinion is still to confidently dismiss a short timeline scenario, to the point of not even seriously engaging with it.
The linked survey is the most comprehensive survey that I’m aware of, and it points to the ML community collectively putting ~10% chance on HLAI in 10 years. I think that if I thought that one should defer to expert opinion, I would put a lot of weight on this survey and very little on the interactions that the author of this piece has had. That being said, the survey also (in my view) shows that the ML community is not that great at prediction.
All in all, my main disagreement with this post is about the level of progress that we’ve seen and are likely to see. It seems like ML has been steadily gaining a bunch of relevant capacities, and that the field has a lot of researchers capable of bringing the field forward both through incremental and fundamental research. The author implicitly thinks that this is nowhere near enough for AGI in 10 years, my broad judgement is that it makes that achievement not unthinkable, but it’s hard to fully lay out the relevant reasons for that judgement.
I think you are missing that AI related performance and hardware growth has a discontinuity moment around 2012 and is growing now in many important metrics with speed 10 times a year (or doubling time of 3.5 months, see e.g. OpenAI’s “AI and compute”). I collected other evidence for 10 per cent in 10 years here.
This strongly predicts a large decline in the rate of compute-based AI progress coming pretty soon. From the OpenAI article:
We believe the largest training runs today employ hardware that cost in the single digit millions of dollars to purchase (although the amortized cost is much lower).
Most of the increase in hardware is due to higher spending. If we assume the speed grows by 10x/year and then adjust for Moore’s law, then the spending increase is 6.3x/year. For the exponential trend to continue for 7 years, AI training runs will be using hardware that costs hundreds of billions of dollars, which is implausible. So this rate of progress will almost certainly slow before 7 years.
We should adjust not to general Moore’s law, but to the speed of performance increase in specialised neural net hardware, which is currently around 4-10 times a year, in form of different TPU, ASICs and neuromorphic chips, like Akida, which has 1 million neurones and 10 billion synapses.
Extrapolating such trends gives roughly human level performance in reach of AI-scientists somewhere in 2020s. It may not require new technological process in a sense of lithography, as the amount of (AI-related)-computation will increase inside a single chip via different architectural tricks, not via shrinking of transistors.
Also, given arms race in AI, I would not be surprise that the biggest players will be able to spend tens of billions of dollars on hardware.
Finally, “7 years” is within the predicted period of 10 years where AGI may appear, that is, even if the fast hardware increase will end, the resulting hardware overhang may be enough to run very powerful AIs.
TPUs don’t increase FLOPS/$ compared to GPUs, they increase serial speed.
The numbers you’re giving for hardware increases are really high, what are the sources? I would only believe that NN-specific hardware is increasing at that rate in the last 5 years if it started out much, much worse than GPUs (which is not a good indication that it can get much better than GPUs in the future).
At the high end, NN-specific hardware would provide a total 40x improvement in the next 10 years, letting the trend go 2 extra years. Hundreds of billions is really high, though; if only tens of billions are possible, then in total the maximum probably amounts to 7-8 years.
If people were already updating on the rate of recent AI progress (which is empirically more based on log compute than compute) and expecting it to continue linearly, then they should adjust their expectations downward. I agree that this isn’t that strong of an argument against AGI in 10 years, but it is a somewhat strong argument against AGI in 20 years given no AGI in 10 years.
FLOPS is a bad measure for neural net training, as NNs typically employ simpler operations with less precision. “TOPS”—trillion operation per second—is widely used now to measure them.
“4-10 times a year growth” is Huang law which was formulated by NVIDIA CEO Huang about the speed of growth of computation in GPU Link.
An example of this law is NVIDIA’s computer DGX-2 released in March 2018 for $400K with 2 Petaflop in deep learning performance which is said to be 10 times faster in neural nets training than the system DGX-1 from 2017 which cost $149K (this means 4 times increase of cost effectiveness in 1 year). Source.
The same way Google’s TPU is growing very quickly:
TPU 3 gen − 2018 − 8 times performance than TPU2, link
I didn’t get why you think that NN specific hardware will produce only 40x improvement in the next years. There is several ways to dramatically increase NN hardware beyond tensor processing units using even more specialised systems:
“· Intel has promised to increase neural net performance 100× by 2020 (from 2017) by use of specialized chip-accelerators they called Nervana.
· 3D chips. A 3D System combining memory and computing cores on a chip may increase energy efficiency 1000 times and computational speed more than 50 times by 2021 by eliminating memory bottlenecks, according to DARPA.
· FPGA. These programmable chips could combine the efficiency of TPUs with the speed of ordinary computers. Fujitsu claims to have optimized FPGA architecture to be 10,000 times faster (Fujitsu, 2016).
· Memristors. Memristors seem to enable more efficient neural networks (Du et al., 2017; Kaplan, Yavits, & Ginosar, 2018). They could be the basis for physical neural nets, which could be especially effective in inference, as each memristor will replace one synapse.
· Spiking neural nets. The TrueNorth chip from IBM provides 10 000× the energy economy of conventional chips and could solve the same tasks as ordinary neural nets after compilation (Hsu, 2014). IBM also invented in 2018 a system of analogues synapses, which provides 100 times the power economy, and also impose less load on the information transfer bus, as the synapses are trained “locally”, as in the human brain (Ambrogio et al., 2018).
· Non-von-Neumann architectures. DARPA is exploring a new type of computing which could offer a 1000× boost in computational power called HIVE which will “its ability to simultaneously perform different processes on different areas of memory simultaneously” and work with data graphs (Johnson, 2017).” (self cite from a draft on the topic).
-In memory computing could provide 100 times growth link.
This all doesn’t take into account several other ways to increase performance, like quicker running of programs on existing hardware via special languages and possible progress in algorithms.
Moreover, if we ignore all this evidence, but take the claim about the neuromorphic chip with 1 million neurons like Akida for its face value—and it is likely that such chips will appear in early 2020s—than performance of such single chip, given its possible speed of hundreds megahertz, will be comparable with some estimations of the human brain performance:
Roughly: million neurons running million times quicker than human ones = performance of 1 trillion of neurons, that is more than human brain.
Obviously, it is not a proof of AGI capability, but the situation than a singe chip has performance comparable with human brain is a prompt to worry and it is reasonable to give 10 per cent probability to AGI appearing in next 10 years.
“4-10 times a year growth” is Huang law which was formulated by NVIDIA CEO Huang about the speed of growth of computation in GPU Link.
This is an absurd overestimate and it wasn’t even in the article you linked. From the article:
Just how fast does GPU technology advance? In his keynote address, Huang pointed out that Nvidia’s GPUs today are 25 times faster than five years ago. If they were advancing according to Moore’s Law, he said, they would have increased their speed only by a factor of 10.
It’s possible to get overestimates/underestimates by changing the time period, but even taking this 5-year time period at face value, the increase per year is 251/5 = 1.9x, not 4-10x.
(What about AlexNet performance? Given that the GPUs are only 25 times faster, this has to be mostly software improvements, not hardware improvements. If taken at face value, a 500x improvement over 5 years is a 3.5x/year multiplier, still lower than your estimate)
Looking at this graph of GPU performance over time, it looks like NVIDIA GeForce GPUs increase by 10x in a 8-year period, i.e. 1.33x/year, i.e. slower than Moore’s law.
Looking at AI impacts’s estimate, the estimate is an improvement of “order of magnitude about every 10-16 years”, i.e. even slower than the estimate in the previous paragraph.
In summary, Huang’s law is marketing, and GPUs are improving no faster than Moore’s law.
Huang law is marketing, but Moore’s law is also marketing. However, the fact that DGX-2 outperform the previous DGX-1 10 times in performance and 4 times in cost efficiency in deep learning applications after 1 year implies that some substance is real here.
The GPU performance graph you posted is from 2013 and is probably obsolete; anyway, I don’t claim that GPU is better than CPU in term of FLOPS. I point on the fact that GPU and later TPU (which not existed in 2013) are more capable on much simpler operations on large matrix, which, however, is what we need for AI.
AI impact article is also probably suffer from the same problem: they still count FLOPS, but we need TOPS to estimate actual performance in neural nets tasks.
Update: for example, NVIDIA in 2008 had top cards with performance around 400 gigaflops, and in 2018 - with 110 592 gigaflop in Tensor operations (wiki) which implies something like 1000 times growth in 10 years, not “order of magnitude about every 10-16 years”. (This may not add up to previous claim of 4-10 times growth, but this claim applies not to GPU, but to TPU—that is specialised ASICs to neural net calculations, which appeared only 2-3 years ago and the field grows very quickly, most visibly in form of Google’s TPU.)
Yes, both are marketing and the reality is that GPUs are improving significantly slower than Moore’s law.
The time period I looked at in the graph is 2008-2016.
I don’t see how TOPS are relevant if GPUs still have the best price performance. I expect FLOPS and TOPS to scale linearly with each other for GPUs, i.e. if FLOPS increases by 2x in some time period then TOPS also will. (I could be wrong but this is the null hypothesis)
Regarding DGX-1 and DGX-2: You can’t extrapolate a medium-term trend from 2 data points 1 year apart like that, that’s completely absurd. Because DGX-2 only has 2 times the FLOPS of DGX-1 (while being 3x as expensive), I assume the 10x improvement is due to a discontinuous improvement in tensorization (similar to TPUs) that can’t be extrapolated.
GTX 280 (a 2008 card) is 620 GFLOPS, not 400. It cost $650 on release, and the card to compare it to (2017 Titan V; I think you meant the 2018 one but it doesn’t change things significantly) costs $3000. The difference in price performance is is 110000/620*650/3000 =38x over 9 years, slower than Moore’s law. We are talking about price performance not absolute performance here, since that is what this thread is about (economic/material constraints on growth of compute).
The signal to noise ratio in numbers in your comments is so low that I’m not trusting anything you’re saying and engaging further is probably not worth it.
Thanks for participating in interesting conversation which helped me to clarify my position.
As I now see, the accelerated growth, above Moore’s law level, started only around 2016 and is related not to GPU, which grew rather slowly, but is related to specialised hardware for neural nets, like Tensor cores, Google TPU and neuromorphic chips like True North and Akida. Neuromorphic chips could give higher acceleration for NNs than Tensor cores, but not yet hit the market.
I disagree with your analysis of “are we that ignorant?”.
For things like nuclear war or financial meltdown, we’ve got lots of relevant data, and not too much reason to expect new risks. For advanced nanotechnology, I think we are ignorant enough that a 10% chance sounds right (I’m guessing it will take something like $1 billion in focused funding).
With AGI, ML researchers can be influenced to change their forecast by 75 years by subtle changes in how the question is worded. That suggests unusual uncertainty.
We can see from Moore’s law and from ML progress that we’re on track for something at least as unusual as the industrial revolution.
The stock and bond markets do provide some evidence of predictability, but I’m unsure how good they are at evaluating events that happen much less than once per century.
I’m kind of surprised that the post doesn’t mention the other, larger discontinuities that they’ve found: nuclear weapons, high-temperature superconduction, and building height.
Metaculus has the closest thing to a prediction market on this topic that I’m aware of, which is worth looking at.
The linked survey is the most comprehensive survey that I’m aware of, and it points to the ML community collectively putting ~10% chance on HLAI in 10 years. I think that if I thought that one should defer to expert opinion, I would put a lot of weight on this survey and very little on the interactions that the author of this piece has had. That being said, the survey also (in my view) shows that the ML community is not that great at prediction.
All in all, my main disagreement with this post is about the level of progress that we’ve seen and are likely to see. It seems like ML has been steadily gaining a bunch of relevant capacities, and that the field has a lot of researchers capable of bringing the field forward both through incremental and fundamental research. The author implicitly thinks that this is nowhere near enough for AGI in 10 years, my broad judgement is that it makes that achievement not unthinkable, but it’s hard to fully lay out the relevant reasons for that judgement.
I think you are missing that AI related performance and hardware growth has a discontinuity moment around 2012 and is growing now in many important metrics with speed 10 times a year (or doubling time of 3.5 months, see e.g. OpenAI’s “AI and compute”). I collected other evidence for 10 per cent in 10 years here.
This strongly predicts a large decline in the rate of compute-based AI progress coming pretty soon. From the OpenAI article:
Most of the increase in hardware is due to higher spending. If we assume the speed grows by 10x/year and then adjust for Moore’s law, then the spending increase is 6.3x/year. For the exponential trend to continue for 7 years, AI training runs will be using hardware that costs hundreds of billions of dollars, which is implausible. So this rate of progress will almost certainly slow before 7 years.
We should adjust not to general Moore’s law, but to the speed of performance increase in specialised neural net hardware, which is currently around 4-10 times a year, in form of different TPU, ASICs and neuromorphic chips, like Akida, which has 1 million neurones and 10 billion synapses.
Extrapolating such trends gives roughly human level performance in reach of AI-scientists somewhere in 2020s. It may not require new technological process in a sense of lithography, as the amount of (AI-related)-computation will increase inside a single chip via different architectural tricks, not via shrinking of transistors.
Also, given arms race in AI, I would not be surprise that the biggest players will be able to spend tens of billions of dollars on hardware.
Finally, “7 years” is within the predicted period of 10 years where AGI may appear, that is, even if the fast hardware increase will end, the resulting hardware overhang may be enough to run very powerful AIs.
TPUs don’t increase FLOPS/$ compared to GPUs, they increase serial speed.
The numbers you’re giving for hardware increases are really high, what are the sources? I would only believe that NN-specific hardware is increasing at that rate in the last 5 years if it started out much, much worse than GPUs (which is not a good indication that it can get much better than GPUs in the future).
At the high end, NN-specific hardware would provide a total 40x improvement in the next 10 years, letting the trend go 2 extra years. Hundreds of billions is really high, though; if only tens of billions are possible, then in total the maximum probably amounts to 7-8 years.
If people were already updating on the rate of recent AI progress (which is empirically more based on log compute than compute) and expecting it to continue linearly, then they should adjust their expectations downward. I agree that this isn’t that strong of an argument against AGI in 10 years, but it is a somewhat strong argument against AGI in 20 years given no AGI in 10 years.
FLOPS is a bad measure for neural net training, as NNs typically employ simpler operations with less precision. “TOPS”—trillion operation per second—is widely used now to measure them.
“4-10 times a year growth” is Huang law which was formulated by NVIDIA CEO Huang about the speed of growth of computation in GPU Link.
An example of this law is NVIDIA’s computer DGX-2 released in March 2018 for $400K with 2 Petaflop in deep learning performance which is said to be 10 times faster in neural nets training than the system DGX-1 from 2017 which cost $149K (this means 4 times increase of cost effectiveness in 1 year). Source.
The same way Google’s TPU is growing very quickly:
TPU 1 generation − 2016 − 92 TOPS, but inference only (that is no learning)
TPU 2 gen − 2017 − 180 teraflops in deep learning
TPU 3 gen − 2018 − 8 times performance than TPU2, link
I didn’t get why you think that NN specific hardware will produce only 40x improvement in the next years. There is several ways to dramatically increase NN hardware beyond tensor processing units using even more specialised systems:
“· Intel has promised to increase neural net performance 100× by 2020 (from 2017) by use of specialized chip-accelerators they called Nervana.
· 3D chips. A 3D System combining memory and computing cores on a chip may increase energy efficiency 1000 times and computational speed more than 50 times by 2021 by eliminating memory bottlenecks, according to DARPA.
· FPGA. These programmable chips could combine the efficiency of TPUs with the speed of ordinary computers. Fujitsu claims to have optimized FPGA architecture to be 10,000 times faster (Fujitsu, 2016).
· Memristors. Memristors seem to enable more efficient neural networks (Du et al., 2017; Kaplan, Yavits, & Ginosar, 2018). They could be the basis for physical neural nets, which could be especially effective in inference, as each memristor will replace one synapse.
· Spiking neural nets. The TrueNorth chip from IBM provides 10 000× the energy economy of conventional chips and could solve the same tasks as ordinary neural nets after compilation (Hsu, 2014). IBM also invented in 2018 a system of analogues synapses, which provides 100 times the power economy, and also impose less load on the information transfer bus, as the synapses are trained “locally”, as in the human brain (Ambrogio et al., 2018).
· Non-von-Neumann architectures. DARPA is exploring a new type of computing which could offer a 1000× boost in computational power called HIVE which will “its ability to simultaneously perform different processes on different areas of memory simultaneously” and work with data graphs (Johnson, 2017).” (self cite from a draft on the topic).
-In memory computing could provide 100 times growth link.
This all doesn’t take into account several other ways to increase performance, like quicker running of programs on existing hardware via special languages and possible progress in algorithms.
Moreover, if we ignore all this evidence, but take the claim about the neuromorphic chip with 1 million neurons like Akida for its face value—and it is likely that such chips will appear in early 2020s—than performance of such single chip, given its possible speed of hundreds megahertz, will be comparable with some estimations of the human brain performance:
Roughly: million neurons running million times quicker than human ones = performance of 1 trillion of neurons, that is more than human brain.
Obviously, it is not a proof of AGI capability, but the situation than a singe chip has performance comparable with human brain is a prompt to worry and it is reasonable to give 10 per cent probability to AGI appearing in next 10 years.
This is an absurd overestimate and it wasn’t even in the article you linked. From the article:
It’s possible to get overestimates/underestimates by changing the time period, but even taking this 5-year time period at face value, the increase per year is 251/5 = 1.9x, not 4-10x.
(What about AlexNet performance? Given that the GPUs are only 25 times faster, this has to be mostly software improvements, not hardware improvements. If taken at face value, a 500x improvement over 5 years is a 3.5x/year multiplier, still lower than your estimate)
Looking at this graph of GPU performance over time, it looks like NVIDIA GeForce GPUs increase by 10x in a 8-year period, i.e. 1.33x/year, i.e. slower than Moore’s law.
Looking at AI impacts’s estimate, the estimate is an improvement of “order of magnitude about every 10-16 years”, i.e. even slower than the estimate in the previous paragraph.
In summary, Huang’s law is marketing, and GPUs are improving no faster than Moore’s law.
Huang law is marketing, but Moore’s law is also marketing. However, the fact that DGX-2 outperform the previous DGX-1 10 times in performance and 4 times in cost efficiency in deep learning applications after 1 year implies that some substance is real here.
The GPU performance graph you posted is from 2013 and is probably obsolete; anyway, I don’t claim that GPU is better than CPU in term of FLOPS. I point on the fact that GPU and later TPU (which not existed in 2013) are more capable on much simpler operations on large matrix, which, however, is what we need for AI.
AI impact article is also probably suffer from the same problem: they still count FLOPS, but we need TOPS to estimate actual performance in neural nets tasks.
Update: for example, NVIDIA in 2008 had top cards with performance around 400 gigaflops, and in 2018 - with 110 592 gigaflop in Tensor operations (wiki) which implies something like 1000 times growth in 10 years, not “order of magnitude about every 10-16 years”. (This may not add up to previous claim of 4-10 times growth, but this claim applies not to GPU, but to TPU—that is specialised ASICs to neural net calculations, which appeared only 2-3 years ago and the field grows very quickly, most visibly in form of Google’s TPU.)
Yes, both are marketing and the reality is that GPUs are improving significantly slower than Moore’s law.
The time period I looked at in the graph is 2008-2016.
I don’t see how TOPS are relevant if GPUs still have the best price performance. I expect FLOPS and TOPS to scale linearly with each other for GPUs, i.e. if FLOPS increases by 2x in some time period then TOPS also will. (I could be wrong but this is the null hypothesis)
Regarding DGX-1 and DGX-2: You can’t extrapolate a medium-term trend from 2 data points 1 year apart like that, that’s completely absurd. Because DGX-2 only has 2 times the FLOPS of DGX-1 (while being 3x as expensive), I assume the 10x improvement is due to a discontinuous improvement in tensorization (similar to TPUs) that can’t be extrapolated.
GTX 280 (a 2008 card) is 620 GFLOPS, not 400. It cost $650 on release, and the card to compare it to (2017 Titan V; I think you meant the 2018 one but it doesn’t change things significantly) costs $3000. The difference in price performance is is 110000/620*650/3000 =38x over 9 years, slower than Moore’s law. We are talking about price performance not absolute performance here, since that is what this thread is about (economic/material constraints on growth of compute).
The signal to noise ratio in numbers in your comments is so low that I’m not trusting anything you’re saying and engaging further is probably not worth it.
[EDIT: fixed the GTX 280 vs Titan V calculation]
Thanks for participating in interesting conversation which helped me to clarify my position.
As I now see, the accelerated growth, above Moore’s law level, started only around 2016 and is related not to GPU, which grew rather slowly, but is related to specialised hardware for neural nets, like Tensor cores, Google TPU and neuromorphic chips like True North and Akida. Neuromorphic chips could give higher acceleration for NNs than Tensor cores, but not yet hit the market.