I think you have the right idea, but it’s a mistake to conflate “needs a big corpus of data” and “needs lots of hardware”. Hardware helps, the faster the training goes the more experiments you can do, but a lot of the time the gating factor is the corpus itself.
For example, if you’re trying to train a neural net to solve the “does this photo contain a bird?” problem, you need a bunch of photos which vary at random on the bird/not-bird axis, and you need human raters to go through and tag each photo as bird/not-bird. There are many ways to lose here. For example, your variable of interest might be correlated to something boring (maybe all the bird photos were taken in the morning, and all the not-bird photos were taken in the afternoon), or your raters have to spend a lot of time with each photo (imagine you want to do beak detection, instead of just bird/not-bird: then your raters have to attach a bunch of metadata to each training image, describing the beak position in each bird photo).
The difference between hardware that’s fast enough to fit many iterations into a time span suitable for writing a paper vs. hardware that is slow enough that feedback is infrequent seems fairly relevant to how fast the software can progress.
New insights depend crucially on feedback gotten from trying out the old insights.
I think you have the right idea, but it’s a mistake to conflate “needs a big corpus of data” and “needs lots of hardware”. Hardware helps, the faster the training goes the more experiments you can do, but a lot of the time the gating factor is the corpus itself.
For example, if you’re trying to train a neural net to solve the “does this photo contain a bird?” problem, you need a bunch of photos which vary at random on the bird/not-bird axis, and you need human raters to go through and tag each photo as bird/not-bird. There are many ways to lose here. For example, your variable of interest might be correlated to something boring (maybe all the bird photos were taken in the morning, and all the not-bird photos were taken in the afternoon), or your raters have to spend a lot of time with each photo (imagine you want to do beak detection, instead of just bird/not-bird: then your raters have to attach a bunch of metadata to each training image, describing the beak position in each bird photo).
The difference between hardware that’s fast enough to fit many iterations into a time span suitable for writing a paper vs. hardware that is slow enough that feedback is infrequent seems fairly relevant to how fast the software can progress.
New insights depend crucially on feedback gotten from trying out the old insights.