I’d be surprised if that latter part continued for several more years. At least for ImageNet, compute cost in dollars has not been a significant constraint (I expect the cost of researcher time far dominates it, even for the non-optimized implementations), so it’s not that surprising that researchers don’t put in the work needed to make it as fast and cheap as possible. Presumably there will be more effort along these axes as compute costs overtake researcher time costs.
If something is just barely possible today with massive compute, maybe it will be possible with much much less compute very soon (e.g. <1 year).
I don’t think it really matters for the argument whether it’s a “small team” that improves the compute-efficiency, or the original team, or a different big team, or whatever. Just that it happens.
Anyway, is the hypothesis true? I would say, it’s very likely if we’re talking about a pioneering new algorithm, because with pioneering new algorithms, we don’t yet have best practices for parallelization, GPU-acceleration, clever shortcuts, etc. etc. On the other hand, if a known, widely-used algorithm is just barely able to do something on the world’s biggest GPU cluster, then it might take longer before it becomes really easy and cheap for anyone to do that thing. Like, maybe it will take a couple years, instead of <1 year :-P
Small teams can also get cheap access to impressive results by buying it from large teams. The large team should set a low price if it has competitors who also sell to many customers.
Agreed, and this also happens “for free” with openness norms, as the post suggests. I’m not strongly disagreeing with the overall thesis of the post, just the specific point that small teams can reproduce impressive results with much fewer resources.
I’d be surprised if that latter part continued for several more years. At least for ImageNet, compute cost in dollars has not been a significant constraint (I expect the cost of researcher time far dominates it, even for the non-optimized implementations), so it’s not that surprising that researchers don’t put in the work needed to make it as fast and cheap as possible. Presumably there will be more effort along these axes as compute costs overtake researcher time costs.
I think the more important hypothesis is:
If something is just barely possible today with massive compute, maybe it will be possible with much much less compute very soon (e.g. <1 year).
I don’t think it really matters for the argument whether it’s a “small team” that improves the compute-efficiency, or the original team, or a different big team, or whatever. Just that it happens.
Anyway, is the hypothesis true? I would say, it’s very likely if we’re talking about a pioneering new algorithm, because with pioneering new algorithms, we don’t yet have best practices for parallelization, GPU-acceleration, clever shortcuts, etc. etc. On the other hand, if a known, widely-used algorithm is just barely able to do something on the world’s biggest GPU cluster, then it might take longer before it becomes really easy and cheap for anyone to do that thing. Like, maybe it will take a couple years, instead of <1 year :-P
Small teams can also get cheap access to impressive results by buying it from large teams. The large team should set a low price if it has competitors who also sell to many customers.
Agreed, and this also happens “for free” with openness norms, as the post suggests. I’m not strongly disagreeing with the overall thesis of the post, just the specific point that small teams can reproduce impressive results with much fewer resources.