I take a vision or language model which was cutting edge in 2000, and run it with a similar amount of compute/data to what’s typically used today.
Guess A. Is the difference (between 2000 and today) modern compute?
I take a modern vision or language model, calculate how much money it costs to train, estimate the amount of compute I could have bought for that much money in 2000, then train it with that much compute
Guess B. Is the difference (between 2000 and today) modern compute costs?
But the experiment doesn’t seem to be about A or B. More likely it’s about both:
Which is more important (to modern ML performance (in what domain?*)):
Typical compute (today versus then)?
Or typical compute cost (today versus then)?
(Minor technical note—if you’re comparing results from the past, to results today, while it might be impossible to go back in time and test these things for a control group, rather than taking ‘things weren’t as good back then’ for granted, this should be tested as well for comparison. (Replicate earlier results.**)
This does admit other hypotheses.
For example, ‘the difference between 2020 and 2000 is that training took a long time, and if people set things up wrong, they didn’t get feedback for a long time. Perhaps modern compute enables researchers to set ML programs up correctly despite the code not being written right the first time.’)
A and B can be rephrased as:
Do we use more compute today, but spend ‘the same amount’?
Do we spend ‘more’ on compute today?
*This might be intended as a more general question, but the post asks about:
vision or language model[s.]
**The most extreme version would be getting/recreating old machines and then re-running old ML stuff on them.
The underlying question I want to answer is: ML performance is limited by both available algorithms and available compute. Both of those have (presumably) improved over time. Relatively speaking, how taut are those two constraints? Has progress come primarily from better algorithms, or from more/cheaper compute?
Will the experiment be run?
What is the experiment? What is the question?
Guess A. Is the difference (between 2000 and today) modern compute?
Guess B. Is the difference (between 2000 and today) modern compute costs?
But the experiment doesn’t seem to be about A or B. More likely it’s about both:
Which is more important (to modern ML performance (in what domain?*)):
Typical compute (today versus then)?
Or typical compute cost (today versus then)?
(Minor technical note—if you’re comparing results from the past, to results today, while it might be impossible to go back in time and test these things for a control group, rather than taking ‘things weren’t as good back then’ for granted, this should be tested as well for comparison. (Replicate earlier results.**)
This does admit other hypotheses.
For example, ‘the difference between 2020 and 2000 is that training took a long time, and if people set things up wrong, they didn’t get feedback for a long time. Perhaps modern compute enables researchers to set ML programs up correctly despite the code not being written right the first time.’)
A and B can be rephrased as:
Do we use more compute today, but spend ‘the same amount’?
Do we spend ‘more’ on compute today?
*This might be intended as a more general question, but the post asks about:
**The most extreme version would be getting/recreating old machines and then re-running old ML stuff on them.
The underlying question I want to answer is: ML performance is limited by both available algorithms and available compute. Both of those have (presumably) improved over time. Relatively speaking, how taut are those two constraints? Has progress come primarily from better algorithms, or from more/cheaper compute?