A double exponential model seems very questionable. Is there any theoretical reason why you chose to fit your model with a double exponential? When fitting your model using a double exponential, did you take into consideration fundamental limits of computation? One cannot engineer transistors to be smaller than atoms, and we are approaching the limit to the size of transistors, so one should not expect very much of an increase in the performance of computational hardware. We can add more transistors to a chip by stacking layers (I don’t know how this would be manufactured, but 3D space has a lot of room), but the important thing is efficiency, and one cannot make the transistors more efficient by stacking more layers. With more layers in 3D chips, most transistors will just be off most of the time, so 3D chips provide a limited improvement.
Landauer’s principle states that to delete a bit of information in computing, one must spend at least k⋅T⋅ln(2) energy where k is Boltzmann’s constant, T is the temperature, and ln(2)=0.693…. Here,k≈1.38⋅10−23J/K (Joules per Kelvin) which is not a lot of energy at room temperature. As the energy efficiency of computation approaches Landauer’s limit, one runs into problems such as thermal noise. Realistically, one should expect to spend more than 100kT energy per bit deletion in order to overcome thermal noise (and this is ). If one tries to avoid Landauer’s limit using reversible computation, then the process of computation becomes more complicated, so with reversible computation, one trades energy efficiency per bit operation with the number of operations computed, and the amount of space one uses in performing that computation. The progress in computational hardware capabilities will slow down as one progresses from classical computation to reversible computation. There are also ways of cutting the energy efficiency of deletion of information from 100kT per bit to something much closer to kTln(2), but they seem like a complicated engineering challenge.
The Margolus-Levitin theorem states that it takes h4E energy to go from a quantum state to an orthogonal quantum state (by flipping a bit, one transforms a state into an orthogonal state) where h is Planck’s constant (h≈6.626⋅10−34J⋅S (Joules times seconds)) and E is the energy. There are other fundamental limits to the capabilities of computation.
As I remarked in other comments on this post, this is a plot of price-performance. The denominator is price, which can become cheap very fast. Potentially, as the demand for AI inference ramps up over the coming decade, the price of chips falls fast enough to drive this curve without chip speed growing nearly as fast. It is primarily an economic argument, not a purely technological argument.
For the purposes of forecasting, and understanding what the coming decade will look like, I think we care more about price-performance than raw chip speed. This is particularly true in a regime where both training and inference of large models benefit from massive parallelism. This means you can scale by buying new chips, and from a business or consumer perspective you benefit if those chips get cheaper and/or if they get faster at the same price.
A double exponential model seems very questionable. Is there any theoretical reason why you chose to fit your model with a double exponential? When fitting your model using a double exponential, did you take into consideration fundamental limits of computation? One cannot engineer transistors to be smaller than atoms, and we are approaching the limit to the size of transistors, so one should not expect very much of an increase in the performance of computational hardware. We can add more transistors to a chip by stacking layers (I don’t know how this would be manufactured, but 3D space has a lot of room), but the important thing is efficiency, and one cannot make the transistors more efficient by stacking more layers. With more layers in 3D chips, most transistors will just be off most of the time, so 3D chips provide a limited improvement.
Landauer’s principle states that to delete a bit of information in computing, one must spend at least k⋅T⋅ln(2) energy where k is Boltzmann’s constant, T is the temperature, and ln(2)=0.693…. Here,k≈1.38⋅10−23J/K (Joules per Kelvin) which is not a lot of energy at room temperature. As the energy efficiency of computation approaches Landauer’s limit, one runs into problems such as thermal noise. Realistically, one should expect to spend more than 100kT energy per bit deletion in order to overcome thermal noise (and this is ). If one tries to avoid Landauer’s limit using reversible computation, then the process of computation becomes more complicated, so with reversible computation, one trades energy efficiency per bit operation with the number of operations computed, and the amount of space one uses in performing that computation. The progress in computational hardware capabilities will slow down as one progresses from classical computation to reversible computation. There are also ways of cutting the energy efficiency of deletion of information from 100kT per bit to something much closer to kTln(2), but they seem like a complicated engineering challenge.
The Margolus-Levitin theorem states that it takes h4E energy to go from a quantum state to an orthogonal quantum state (by flipping a bit, one transforms a state into an orthogonal state) where h is Planck’s constant (h≈6.626⋅10−34J⋅S (Joules times seconds)) and E is the energy. There are other fundamental limits to the capabilities of computation.
As I remarked in other comments on this post, this is a plot of price-performance. The denominator is price, which can become cheap very fast. Potentially, as the demand for AI inference ramps up over the coming decade, the price of chips falls fast enough to drive this curve without chip speed growing nearly as fast. It is primarily an economic argument, not a purely technological argument.
For the purposes of forecasting, and understanding what the coming decade will look like, I think we care more about price-performance than raw chip speed. This is particularly true in a regime where both training and inference of large models benefit from massive parallelism. This means you can scale by buying new chips, and from a business or consumer perspective you benefit if those chips get cheaper and/or if they get faster at the same price.