GPT-3: it used up a particular kind of overhang you might call the “small-scale industrial CS R&D budget hardware overhang”. (It would certainly be possible to make much greater than GPT-3-level progress, but you’d need vastly larger budgets: say, 10% of a failed erectile-dysfunction drug candidate, or 0.1% of the money it takes to run a failed European fusion reactor or particle collider.) So, I continue to stand by my scaling hypothesis essay’s paradigm that as expected, we saw some imitation and catchup, but no one created a model much bigger than GPT-3, never mind one that was >100x bigger the way GPT-3 was to GPT-2-1.5b, because no one at relevant corporations truly believes in scaling or wishes to commit the necessary resources, or feels that it’s near a crunchtime where there might be a rush to train a model at the edge of the possible, and OA itself has been resting on its laurels as it turns into a SaaS startup. (We’ll see what the Anthropic refugees choose to do with their $124m seed capital, but so far they appear to be making a relaxed start of it as well.)
The overhang GPT-3 used up should not be confused with other overhangs. There are many other hardware overhangs of interest: the hardware overhang of the experience curve where the cost halves every year or two; the hardware overhang of a distilled/compressed/sparsified model; the hardware overhang of the global compute infrastructure available to a rogue agent. The small-scale industrial R&D overhang is the relevant and binding one… for now. But the others become relevant later on, under different circumstances, and many of them keep getting bigger.
Why would QC be irrelevant? Quantum systems don’t perform well on all tasks, but they generally work well for parallel tasks, right? And neural nets are largely parallel. QC isn’t to the point of being able to help yet, but especially if conventional computing becomes a serious bottleneck, it might become important over the next decade.
I think that the only known quantum speedup for relatively generic tasks is from Grover’s algorithm, which only gives a quadratic speedup. That might be significant some day, or not, depending on the cost of quantum hardware. When it comes to superpolynomial speed-ups, it is very much an active field of study which tasks are relevant, and as far as we know it’s only some very specialized tasks like integer factoring. A bunch of people are trying to apply QC to ML but AFAIK it’s still anyone’s guess whether that will end up being significant.
And some of the past QC claims for ML have not panned out. Like, I think there was a Quantum Monte Carlo claimed to be potentially useful for ML which could be done on cheaper QC archs, but then it turned out to be doable classically...? In any case, I have been reading about QCs all my life, and they have yet to become relevant to anything I care about; and I assume Scott Aaronson will alert us should they suddenly become relevant to AI/ML/DL, so the rest of us should go about our lives until that day.
Moore: yes.
QC: AFAIK it’s irrelevant?
GPT-3: it used up a particular kind of overhang you might call the “small-scale industrial CS R&D budget hardware overhang”. (It would certainly be possible to make much greater than GPT-3-level progress, but you’d need vastly larger budgets: say, 10% of a failed erectile-dysfunction drug candidate, or 0.1% of the money it takes to run a failed European fusion reactor or particle collider.) So, I continue to stand by my scaling hypothesis essay’s paradigm that as expected, we saw some imitation and catchup, but no one created a model much bigger than GPT-3, never mind one that was >100x bigger the way GPT-3 was to GPT-2-1.5b, because no one at relevant corporations truly believes in scaling or wishes to commit the necessary resources, or feels that it’s near a crunchtime where there might be a rush to train a model at the edge of the possible, and OA itself has been resting on its laurels as it turns into a SaaS startup. (We’ll see what the Anthropic refugees choose to do with their $124m seed capital, but so far they appear to be making a relaxed start of it as well.)
The overhang GPT-3 used up should not be confused with other overhangs. There are many other hardware overhangs of interest: the hardware overhang of the experience curve where the cost halves every year or two; the hardware overhang of a distilled/compressed/sparsified model; the hardware overhang of the global compute infrastructure available to a rogue agent. The small-scale industrial R&D overhang is the relevant and binding one… for now. But the others become relevant later on, under different circumstances, and many of them keep getting bigger.
Why would QC be irrelevant? Quantum systems don’t perform well on all tasks, but they generally work well for parallel tasks, right? And neural nets are largely parallel. QC isn’t to the point of being able to help yet, but especially if conventional computing becomes a serious bottleneck, it might become important over the next decade.
I think that the only known quantum speedup for relatively generic tasks is from Grover’s algorithm, which only gives a quadratic speedup. That might be significant some day, or not, depending on the cost of quantum hardware. When it comes to superpolynomial speed-ups, it is very much an active field of study which tasks are relevant, and as far as we know it’s only some very specialized tasks like integer factoring. A bunch of people are trying to apply QC to ML but AFAIK it’s still anyone’s guess whether that will end up being significant.
And some of the past QC claims for ML have not panned out. Like, I think there was a Quantum Monte Carlo claimed to be potentially useful for ML which could be done on cheaper QC archs, but then it turned out to be doable classically...? In any case, I have been reading about QCs all my life, and they have yet to become relevant to anything I care about; and I assume Scott Aaronson will alert us should they suddenly become relevant to AI/ML/DL, so the rest of us should go about our lives until that day.