Epistemic status:Explorative. See results more as a sketch towards a possible issue than a proper derivation of reliable values. Research behind exact numbers is limited and calculations are without error propagation. I’d rather get the idea out there than wait until I find the time to do it properly.
This post will sketch at a concern I had recently but also point out one reasons for why it isn’t as bad as it may seem.
Model
In short, the insight comes from the fact that amount spent training the largest ML systems scales faster than the spending the median ML researcher spends on training their systems. Thus giving most ML researchers a more and more skewed view of how the best ML systems behave.
From current compute trends we have either a 10 month doubling time or a 6 month doubling time. To translate this to increase in funding for these systems we will have to estimate how the cost of compute changes over time.
As usual we can consider Moore’s law with a 24 month doubling time. We can also consider recent trends in price of GPUS per FLOP/s which suggest that for half precision, the amount of FLOP/$ has a doubling time of 18 months[1].
This would mean that funding for state of largest ML systems is doubling at least every 22 months as
210−218≈222.
Now the question that remains is how most ML researchers spending on training ML systems scale. This I don’t have any good numbers. Though, I found some old data that suggests that US national funding towards Information and Intelligent Systems (the relevant subgroup) has doubled about every 22 months. Which would mean that the same scaling factor applies as for the largest models.
However, I doubt that the number of ML researchers have remained constant during this time. In fact the number of peer-reviewed AI publications on the topic of AI seem to have doubled every 25 months[2]. While it could be that the ML researchers are simply increasing the amount of published articles exponentially, I expect that it is rather because the amount of researchers are increasing. If the amount of ML researchers double every 30 months[3], then this would mean that the amount of spending per ML researcher instead doubles every 80-90 months or so.
This would mean that the largest ML models will double compared to the size of models used by ML researchers every 30 months. And experiences by ML researchers will grow exponentially further from being relevant for the largest systems.
Why this is not the full picture
The above model assumes that ML researchers are all using the ML systems they have trained themselves. This is not the full picture as many trained systems can be accessed by researchers by other means. As such expectations for trained systems will perhaps not fall behind exponentially.
However, I still think that many important intuitions are formed by training your own ML models and experimenting with doing stuff yourself. This type of experience is that which I fear will become further and further away from the class of systems which will be the most likely candidates for early Transformative AIs.
This is a rough estimate from Figure 1.1.1b and the doubling rate would be slower if I would estimate the rate starting from the start of Sevilla et al.s “Deep Learning Era” instead of the start of the latest increase.
This estimate is conservative in order to take into consideration that existing researchers on average probably increases their rate of publication as they get more senior.
Intuitions by ML researchers may get progressively worse concerning likely candidates for transformative AI
Epistemic status: Explorative. See results more as a sketch towards a possible issue than a proper derivation of reliable values. Research behind exact numbers is limited and calculations are without error propagation. I’d rather get the idea out there than wait until I find the time to do it properly.
This post will sketch at a concern I had recently but also point out one reasons for why it isn’t as bad as it may seem.
Model
In short, the insight comes from the fact that amount spent training the largest ML systems scales faster than the spending the median ML researcher spends on training their systems. Thus giving most ML researchers a more and more skewed view of how the best ML systems behave.
From current compute trends we have either a 10 month doubling time or a 6 month doubling time. To translate this to increase in funding for these systems we will have to estimate how the cost of compute changes over time.
As usual we can consider Moore’s law with a 24 month doubling time. We can also consider recent trends in price of GPUS per FLOP/s which suggest that for half precision, the amount of FLOP/$ has a doubling time of 18 months[1].
This would mean that funding for state of largest ML systems is doubling at least every 22 months as
210−218≈222.
Now the question that remains is how most ML researchers spending on training ML systems scale. This I don’t have any good numbers. Though, I found some old data that suggests that US national funding towards Information and Intelligent Systems (the relevant subgroup) has doubled about every 22 months. Which would mean that the same scaling factor applies as for the largest models.
However, I doubt that the number of ML researchers have remained constant during this time. In fact the number of peer-reviewed AI publications on the topic of AI seem to have doubled every 25 months[2]. While it could be that the ML researchers are simply increasing the amount of published articles exponentially, I expect that it is rather because the amount of researchers are increasing. If the amount of ML researchers double every 30 months[3], then this would mean that the amount of spending per ML researcher instead doubles every 80-90 months or so.
This would mean that the largest ML models will double compared to the size of models used by ML researchers every 30 months. And experiences by ML researchers will grow exponentially further from being relevant for the largest systems.
Why this is not the full picture
The above model assumes that ML researchers are all using the ML systems they have trained themselves. This is not the full picture as many trained systems can be accessed by researchers by other means. As such expectations for trained systems will perhaps not fall behind exponentially.
However, I still think that many important intuitions are formed by training your own ML models and experimenting with doing stuff yourself. This type of experience is that which I fear will become further and further away from the class of systems which will be the most likely candidates for early Transformative AIs.
Using 95th percentile active prices and counting FMA as two FLOP the same way as Sevilla et al.
This is a rough estimate from Figure 1.1.1b and the doubling rate would be slower if I would estimate the rate starting from the start of Sevilla et al.s “Deep Learning Era” instead of the start of the latest increase.
This estimate is conservative in order to take into consideration that existing researchers on average probably increases their rate of publication as they get more senior.