The amount of compute required to train neural networks to SOTA (state of the art) on some tasks has been decreasing at an exponential rate. Efficiency of training AI models seems to be growing faster than Moore’s law.
From June 2012 to May 2019, the amount of compute needed to train a neural network to AlexNet level of performance fell 44x (halving every 16 months). In contrast, Moore’s Law would have provided an 11x doubling in available compute.
There are other examples of exponential gains in efficiency. From the article:
We saw a similar rate of training efficiency improvement for ResNet-50 level performance on ImageNet (17-month doubling time).716 We saw faster rates of improvement over shorter timescales in Translation, Go, and Dota 2:
Within translation, the Transformer22 surpassed seq2seq23 performance on English to French translation on WMT’14 with 61x less training compute 3 years later.
We estimate AlphaZero24 took 8x less compute to get to AlphaGoZero25 level performance 1 year later.
OpenAI Five Rerun required 5x less training compute to surpass OpenAI Five26 (which beat the world champions, OG) 3 months later.
AI and Efficiency
Link post
The amount of compute required to train neural networks to SOTA (state of the art) on some tasks has been decreasing at an exponential rate. Efficiency of training AI models seems to be growing faster than Moore’s law.
From June 2012 to May 2019, the amount of compute needed to train a neural network to AlexNet level of performance fell 44x (halving every 16 months). In contrast, Moore’s Law would have provided an 11x doubling in available compute.
There are other examples of exponential gains in efficiency. From the article: