Executive Summary
Using a dataset of 470 models of graphics processing units (GPUs) released between 2006 and 2021, we find that the amount of floating-point operations/second per $ (hereafter FLOP/s per $) doubles every ~2.5 years. For top GPUs, we find a slower rate of improvement (FLOP/s per $ doubles every 2.95 years), while for models of GPU typically used in ML research, we find a faster rate of improvement (FLOP/s per $ doubles every 2.07 years). GPU price-performance improvements have generally been slightly slower than the 2-year doubling time associated with Moore’s law, much slower than what is implied by Huang’s law, yet considerably faster than was generally found in prior work on trends in GPU price-performance. Our work aims to provide a more precise characterization of GPU price-performance trends based on more or higher-quality data, that is more robust to justifiable changes in the analysis than previous investigations.
Figure 1. Plots of FLOP/s and FLOP/s per dollar for our dataset and relevant trends from the existing literature
Trend | 2x time | 10x time | Metric |
Our dataset (n=470) | 2.46 years [2.24, 2.72] | 8.17 years [7.45, 9.04] | FLOP/s per dollar |
ML GPUs (n=26) | 2.07 years [1.54, 3.13] | 6.86 years [5.12, 10.39] | FLOP/s per dollar |
Top GPUs (n=57) | 2.95 years [2.54, 3.52] | 9.81 years [8.45, 11.71] | FLOP/s per dollar |
Our data FP16 (n=91) | 2.30 years [1.69, 3.62] | 7.64 years [5.60, 12.03] | FLOP/s per dollar |
Moore’s law | 2 years | 6.64 years | FLOP/s |
Huang’s law | 1.08 years | 3.58 years | FLOP/s |
CPU historical (AI Impacts, 2019) | 2.32 years | 7.7 years | FLOP/s per dollar |
Bergal, 2019 | 4.4 years | 14.7 years | FLOPs/dollar |
Table 1. Summary of our findings on GPU price-performance trends and relevant trends in the existing literature with the 95% confidence intervals in square brackets.
In future work, we intend to build on this work to produce projections of GPU price-performance, and investigate how our findings inform us about the growth in dollar-spending on computing hardware in Machine Learning.
We would like to thank Alyssa Vance, Ashwin Acharya, Jessica Taylor and the Epoch team for helpful feedback and comments.
In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading.
In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore’s law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth is probably faster than Moore’s law.
The main reason is that FP32 is just not the right thing to look at in modern ML and we even knew this at the time of writing, i.e. it ignores tensor cores and lower precisions like TF16 or INT8.
I’m a little worried that people who read this post but don’t have any background in ML got the wrong takeaway from the post and we should have emphasized this difference even more at the time. We have written a follow-up post about this recently here: https://epochai.org/blog/trends-in-machine-learning-hardware
I feel like the new post does a better job at explaining where compute progress comes from.