The next AI winter will be due to energy costs

hippke24 Nov 2020 16:53 UTC

70 points

Summary: We are 3 orders of magnitude from the Landauer limit (calculations per kWh). After that, progress in AI can not come from throwing more compute at known algorithms. Instead, new methods must be develloped. This may cause another AI winter, where the rate of progress decreases.

Over the last 8 decades, the energy efficiency of computers has improved by 15 orders of magnitude. Chips manufactured in 2020 feature 16 bn transistors on a 100mm² area. The switching energy per transistor is only $3 \times 10^{- 18}$ J (see Figure). This remarkable progress brings us close to the theoretical limit of energy consumption for computations, the Landauer principle: “any logically irreversible manipulation of information, such as the erasure of a bit or the merging of two computation paths, must be accompanied by a corresponding entropy increase in non-information-bearing degrees of freedom of the information-processing apparatus or its environment”.

Figure: Switching energy per transistor over time. Data points from Landauer (1988), Wong et al. (2020), own calculations.

The Landauer limit of $k T ln (2)$ is, at room temperature, $3 \times 10^{- 21}$ J per operation. Compared to this, 2020 chips (tsmc 5nm node) consume a factor of 1,175x as much energy. Yet, after improving by 15 orders of magnitude, we are getting close to the limit – only 3 orders of magnitude improvement are left. A computation which costs 1,000 USD in energy today may cost as low as 1 USD in the future (assuming the same price of USD per kWh). However, further order-of-magnitude improvements of classical computers are forbidden by physics.

At the moment, AI improves rapidly simply because current algorithms yield significant improvements when increasing compute. It is often better to double the compute than work on improving the algorithm. However, compute prices will decrease less rapidly in the future. Then, AI will need better algorithms. If these can not be found as rapidly as compute helped in the past, AI will not grow on the same trajectory any more. Progress slows. Then, a second AI winter can happen.

As a practical example, consider the training of GPT-3 which required $3 \times 10^{23}$ FLOPs. When such training is performed on V100 GPUs (12 nm node), this would have cost 5m USD (market price, not energy price). The pure energy price would have been 350k USD (assuming V100 GPUs, 300 W for 7 TFLOPs, 10 ct/kWh). With simple scaling, at the $k T$ limit, one gets $10^{22}$ FLOPs per EUR (or $10^{28}$ FLOPs for 1m EUR, $10^{31}$ FLOPs for 1 bn USD in energy). With a $k T$ -limit computer, one could easily imagine to scale by 1,000x and learn a GPT-4, and perhaps even GPT-5. But beyond that, new algorithms (and/or a Manhattan project level effort) are required.

Following the current trajectory of node shrinks in chip manufacturing, we may reach the limit in about 20 years.

Arguments that the numbers given above are optimistic:

A $k T$ -type computer assumes that all energy goes into gate flips. No parasitic losses exist, and no connects are required. In practice, only part of the energy goes into gate flips. Then, the lower limit is $n \times k T$ with $n \sim 10$ or $n \sim 100$ ; the winter will begin in 10 years and not in 20 years.

Arguments that the numbers given are pessimistic:

The heat waste of a classical computer is typically dissipated into the environment (eventually, into space); often at additional cooling costs. In principle, one could process the heat waste with a heat pump. This process is limited by the Carnot efficiency, which is typically a factor of a few.
Energy prices (in USD per kWh) may decrease in the future (solar? fusion?)
If reversible computers could be made, the Landauer limit would not apply. From my limited understanding, it is presently unclear whether such devices could be made in practically useful form.
I do not understand the impact of quantum computing on AI, and whether such a device can be made in practically useful form.

Other caveats:

To improve speed, chips use more transistors than minimally required to perform calculations. For example, large die areas are filled with caches. A current estimate for the number of transistor switches per FLOP is $10^{6}$ . This number can in principle be reduced in order to increase the number of FLOPs per unit energy, at the price of lower speed.

What links here?

hippke24 Nov 2020 16:53 UTC

70 points

8 comments2 min readLW link

AI AI Timelines

Steven Byrnes 24 Nov 2020 20:47 UTC
14 points
At the moment, AI improves rapidly simply because current algorithms yield significant improvements when increasing compute. It is often better to double the compute than work on improving the algorithm. However, compute prices will decrease less rapidly in the future. Then, AI will need better algorithms. If these can not be found as rapidly as compute helped in the past, AI will not grow on the same trajectory any more. Progress slows. Then, a second AI winter can happen.
I kinda disagree with this, especially the first sentence. “Increasing compute” is indeed one thing that is happening in AI, and it’s in the headlines a lot, but it’s not the only thing happening in AI. Algorithmic innovations are happening now and have been happening all along. Like, 3 years ago, the Transformer had just been invented … 5 years ago there was no BatchNorm or ResNets …. In the area I’m most interested in (neocortex-like models), the (IMO) most promising developments are in a very early research-project-ish stage, maybe like deep neural nets were in the 1990s, probably years away from progressing to parallelized, hardware-accelerated, turn-key code that can even begin to be massively scaled.
- Polytopos 30 Nov 2020 6:22 UTC
  16 points
  Parent
  Agreed. Open AI did a study on the trends of algorithm efficiency. They found a 44x improvement in training efficiency on ImageNet over 7 years.
  
  https://openai.com/blog/ai-and-efficiency/
Tetraspace 30 Nov 2020 23:24 UTC
8 points
In The Age of Em, I was somewhat confused by the talk of reversible computing, since I assumed that the Laudauer limit was some distant sci-fi thing, probably derived by doing all your computation on the event horizon of a black hole. That we’re only three orders of magnitude away from it was surprising and definitely gives me something to give more consideration to. The future is reversible!
I did a back-of-the-envelope calculation about what a Landauer limit computer would look like to rejiggle my intuitions with respect to this, because “amazing sci-fi future” to “15 years at current rates of progress” is quite an update.
Then, the lower limit is $n \times k T$ with $n \sim 10$ or $n \sim 100$ [...] A current estimate for the number of transistor switches per FLOP is $10^{6}$ .
The peak of human computational ingenuity is of course the games console. When doing something very intensive, the PS5 consumes 200 watts and does 10 teraFLOPs ( $10^{13}$ FLOPs). At the Landauer limit, that power would do $10^{23}$ bit erasures per second. The difference is $10^{10}$ − 6 orders of magnitude from FLOPs to bit erasure conversion, 1 order of magnitude from inefficiency, 3 orders of magnitude from physical limits, perhaps.
- maximkazhenkov 19 Dec 2020 8:09 UTC
  11 points
  Parent
  6 orders of magnitude from FLOPs to bit erasure conversion
  Does it take a million bit erasures to conduct a single floating point operation? That seems a bit excessive to me.
ChristianKl 24 Nov 2020 18:48 UTC
8 points
Energy prices (in USD per kWh) may decrease in the future (solar? fusion?)
Solar costs do fall exponentially if the last decades are any indication. If you are fine with not getting 365/7/24 energy because energy is your biggest cost they will give you cheap power.
There might also be a time where you want to fly your solar cells nearer to the sun to pick up more sun. Computers in space could be powered that way.
Roko 5 Apr 2023 8:29 UTC
2 points
I think that AI will just consume a lot more power moving forward rather than solving reversible computation. Simple calculation: computers as a whole consume about 3GW in the US, i.e. about 10^10 W (with a bit of wiggle room).

The total solar energy incident on the USA is about 10^15W when you take into account day/night, clouds, etc.

So there’s a factor of, say, 10^4 − 10^5 available just by capturing all that sunlight and using it for computation.
maximkazhenkov 17 Dec 2020 10:55 UTC
1 point
Wait a minute—does this mean that microprocessors have already far surpassed the switching energy efficiency of the human brain? That came to me as a surprise
What links here?
- maximkazhenkov's comment on Thoughts On Computronium by Darklight (5 Mar 2021 19:22 UTC; 3 points)
- Charbel-Raphaël 7 Aug 2022 23:41 UTC
  3 points
  Parent
  Where do you read that?