I think the evidence in the first part suggesting an abundance of compute is mostly explained by the fact that academics expect that we need ideas and algorithmic breakthroughs rather than simply scaling up existing algorithms, so you should update on that fact rather than this evidence which is a downstream effect. If we condition on AGI requiring new ideas or algorithms, I think it is uncontroversial that we do not require huge amounts of compute to test out these new ideas.
The “we are bottlenecked on compute” argument should be taken as a statement about how to advance the state of the art in big unsolved problems in a sufficiently general way (that is, without encoding too much domain knowledge). Note that ImageNet is basically solved, so it does not fall in this category. At this point, it is a “small” problem and it’s reasonable to say that it has an overabundance of compute, since it requiresfour orders of magnitude less compute than AlphaGo (and probably Dota). For the unsolved general problems, I do expect that researchers do use efficient training tricks where they can find them, and they probably optimize hyperparameters in some smarter way. For example, AlphaGo’s hyperparameters were trained via Bayesian optimization.
Particular narrow problems can be solved by adding domain knowledge, or applying an existing technique that no one had bothered to do before. Particular new ideas can be tested by building simple environments or datasets in which those ideas should work. It’s not surprising that these approaches are not bottlenecked on compute.
The evidence in the first part can be explained as follows, assuming that researchers are focused on testing new ideas:
New ideas can often be evaluated in small, simple environments that do not require much compute.
Any trick that you apply makes it harder to tell what effect your idea is having (since you have to disentangle it from the effect of the trick).
Many tricks do not apply in the domain that the new idea is being tested in. Supervised learning has a bunch of tricks that now seem to work fairly robustly, but this is not so with reinforcement learning.
Jeremy Howard notes that he doesn’t believe large amounts of compute are required for important ML research, and notes that many foundational discoveries were available with little compute.
I would assume that Jeremy Howard thinks we are bottlenecked on ideas.
For the points about training efficiency and grid searches this could just be an inefficiency in ML research and all the major AGI progress will be made by a few well-funded teams at the boundaries of modern compute that have solved these problems internally.
This seems basically right. I’d note that there can be a balance, so it’s not clear that this is an “inefficiency”—you could believe that any actual AGI will be developed by well-funded teams like you describe, but they will use some ideas that were developed by ML research that doesn’t require huge amounts of compute. It still seems consistent to say “compute is a major driver of progress in AI research, and we are bottlenecked on it”.
I think the evidence in the first part suggesting an abundance of compute is mostly explained by the fact that academics expect that we need ideas and algorithmic breakthroughs rather than simply scaling up existing algorithms, so you should update on that fact rather than this evidence which is a downstream effect. If we condition on AGI requiring new ideas or algorithms, I think it is uncontroversial that we do not require huge amounts of compute to test out these new ideas.
The “we are bottlenecked on compute” argument should be taken as a statement about how to advance the state of the art in big unsolved problems in a sufficiently general way (that is, without encoding too much domain knowledge). Note that ImageNet is basically solved, so it does not fall in this category. At this point, it is a “small” problem and it’s reasonable to say that it has an overabundance of compute, since it requires four orders of magnitude less compute than AlphaGo (and probably Dota). For the unsolved general problems, I do expect that researchers do use efficient training tricks where they can find them, and they probably optimize hyperparameters in some smarter way. For example, AlphaGo’s hyperparameters were trained via Bayesian optimization.
Particular narrow problems can be solved by adding domain knowledge, or applying an existing technique that no one had bothered to do before. Particular new ideas can be tested by building simple environments or datasets in which those ideas should work. It’s not surprising that these approaches are not bottlenecked on compute.
The evidence in the first part can be explained as follows, assuming that researchers are focused on testing new ideas:
New ideas can often be evaluated in small, simple environments that do not require much compute.
Any trick that you apply makes it harder to tell what effect your idea is having (since you have to disentangle it from the effect of the trick).
Many tricks do not apply in the domain that the new idea is being tested in. Supervised learning has a bunch of tricks that now seem to work fairly robustly, but this is not so with reinforcement learning.
I would assume that Jeremy Howard thinks we are bottlenecked on ideas.
This seems basically right. I’d note that there can be a balance, so it’s not clear that this is an “inefficiency”—you could believe that any actual AGI will be developed by well-funded teams like you describe, but they will use some ideas that were developed by ML research that doesn’t require huge amounts of compute. It still seems consistent to say “compute is a major driver of progress in AI research, and we are bottlenecked on it”.