Summary: We are 3 orders of magnitude from the Landauer limit (calculations per kWh). After that, progress in AI can not come from throwing more compute at known algorithms. Instead, new methods must be develloped. This may cause another AI winter, where the rate of progress decreases.
Over the last 8 decades, the energy efficiency of computers has improved by 15 orders of magnitude. Chips manufactured in 2020 feature 16 bn transistors on a 100mm² area. The switching energy per transistor is only 3×10−18 J (see Figure). This remarkable progress brings us close to the theoretical limit of energy consumption for computations, the Landauer principle: “any logically irreversible manipulation of information, such as the erasure of a bit or the merging of two computation paths, must be accompanied by a corresponding entropy increase in non-information-bearing degrees of freedom of the information-processing apparatus or its environment”.
The Landauer limit of kTln(2) is, at room temperature, 3×10−21 J per operation. Compared to this, 2020 chips (tsmc 5nm node) consume a factor of 1,175x as much energy. Yet, after improving by 15 orders of magnitude, we are getting close to the limit – only 3 orders of magnitude improvement are left. A computation which costs 1,000 USD in energy today may cost as low as 1 USD in the future (assuming the same price of USD per kWh). However, further order-of-magnitude improvements of classical computers are forbidden by physics.
At the moment, AI improves rapidly simply because current algorithms yield significant improvements when increasing compute. It is often better to double the compute than work on improving the algorithm. However, compute prices will decrease less rapidly in the future. Then, AI will need better algorithms. If these can not be found as rapidly as compute helped in the past, AI will not grow on the same trajectory any more. Progress slows. Then, a second AI winter can happen.
As a practical example, consider the training of GPT-3 which required 3×1023 FLOPs. When such training is performed on V100 GPUs (12 nm node), this would have cost 5m USD (market price, not energy price). The pure energy price would have been 350k USD (assuming V100 GPUs, 300 W for 7 TFLOPs, 10 ct/kWh). With simple scaling, at the kT limit, one gets 1022 FLOPs per EUR (or 1028 FLOPs for 1m EUR, 1031 FLOPs for 1 bn USD in energy). With a kT-limit computer, one could easily imagine to scale by 1,000x and learn a GPT-4, and perhaps even GPT-5. But beyond that, new algorithms (and/or a Manhattan project level effort) are required.
Following the current trajectory of node shrinks in chip manufacturing, we may reach the limit in about 20 years.
Arguments that the numbers given above are optimistic:
A kT-type computer assumes that all energy goes into gate flips. No parasitic losses exist, and no connects are required. In practice, only part of the energy goes into gate flips. Then, the lower limit is n×kT with n∼10 or n∼100; the winter will begin in 10 years and not in 20 years.
Arguments that the numbers given are pessimistic:
The heat waste of a classical computer is typically dissipated into the environment (eventually, into space); often at additional cooling costs. In principle, one could process the heat waste with a heat pump. This process is limited by the Carnot efficiency, which is typically a factor of a few.
Energy prices (in USD per kWh) may decrease in the future (solar? fusion?)
If reversible computers could be made, the Landauer limit would not apply. From my limited understanding, it is presently unclear whether such devices could be made in practically useful form.
I do not understand the impact of quantum computing on AI, and whether such a device can be made in practically useful form.
Other caveats:
To improve speed, chips use more transistors than minimally required to perform calculations. For example, large die areas are filled with caches. A current estimate for the number of transistor switches per FLOP is 106. This number can in principle be reduced in order to increase the number of FLOPs per unit energy, at the price of lower speed.
The next AI winter will be due to energy costs
Summary: We are 3 orders of magnitude from the Landauer limit (calculations per kWh). After that, progress in AI can not come from throwing more compute at known algorithms. Instead, new methods must be develloped. This may cause another AI winter, where the rate of progress decreases.
Over the last 8 decades, the energy efficiency of computers has improved by 15 orders of magnitude. Chips manufactured in 2020 feature 16 bn transistors on a 100mm² area. The switching energy per transistor is only 3×10−18 J (see Figure). This remarkable progress brings us close to the theoretical limit of energy consumption for computations, the Landauer principle: “any logically irreversible manipulation of information, such as the erasure of a bit or the merging of two computation paths, must be accompanied by a corresponding entropy increase in non-information-bearing degrees of freedom of the information-processing apparatus or its environment”.
Figure: Switching energy per transistor over time. Data points from Landauer (1988), Wong et al. (2020), own calculations.
The Landauer limit of kTln(2) is, at room temperature, 3×10−21 J per operation. Compared to this, 2020 chips (tsmc 5nm node) consume a factor of 1,175x as much energy. Yet, after improving by 15 orders of magnitude, we are getting close to the limit – only 3 orders of magnitude improvement are left. A computation which costs 1,000 USD in energy today may cost as low as 1 USD in the future (assuming the same price of USD per kWh). However, further order-of-magnitude improvements of classical computers are forbidden by physics.
At the moment, AI improves rapidly simply because current algorithms yield significant improvements when increasing compute. It is often better to double the compute than work on improving the algorithm. However, compute prices will decrease less rapidly in the future. Then, AI will need better algorithms. If these can not be found as rapidly as compute helped in the past, AI will not grow on the same trajectory any more. Progress slows. Then, a second AI winter can happen.
As a practical example, consider the training of GPT-3 which required 3×1023 FLOPs. When such training is performed on V100 GPUs (12 nm node), this would have cost 5m USD (market price, not energy price). The pure energy price would have been 350k USD (assuming V100 GPUs, 300 W for 7 TFLOPs, 10 ct/kWh). With simple scaling, at the kT limit, one gets 1022 FLOPs per EUR (or 1028 FLOPs for 1m EUR, 1031 FLOPs for 1 bn USD in energy). With a kT-limit computer, one could easily imagine to scale by 1,000x and learn a GPT-4, and perhaps even GPT-5. But beyond that, new algorithms (and/or a Manhattan project level effort) are required.
Following the current trajectory of node shrinks in chip manufacturing, we may reach the limit in about 20 years.
Arguments that the numbers given above are optimistic:
A kT-type computer assumes that all energy goes into gate flips. No parasitic losses exist, and no connects are required. In practice, only part of the energy goes into gate flips. Then, the lower limit is n×kT with n∼10 or n∼100; the winter will begin in 10 years and not in 20 years.
Arguments that the numbers given are pessimistic:
The heat waste of a classical computer is typically dissipated into the environment (eventually, into space); often at additional cooling costs. In principle, one could process the heat waste with a heat pump. This process is limited by the Carnot efficiency, which is typically a factor of a few.
Energy prices (in USD per kWh) may decrease in the future (solar? fusion?)
If reversible computers could be made, the Landauer limit would not apply. From my limited understanding, it is presently unclear whether such devices could be made in practically useful form.
I do not understand the impact of quantum computing on AI, and whether such a device can be made in practically useful form.
Other caveats:
To improve speed, chips use more transistors than minimally required to perform calculations. For example, large die areas are filled with caches. A current estimate for the number of transistor switches per FLOP is 106. This number can in principle be reduced in order to increase the number of FLOPs per unit energy, at the price of lower speed.