I think this is true to an extent, but a more systematic analysis needs to back this up.
For instance, I recall quantization techniques working much better after a certain scale (though I can’t seem to find the reference...). It also seems important to validate that techniques to increase performance apply at large scales. Finally, note that the frontier of scale is growing very fast, so even if these discoveries were done with relatively modest compute compared to the frontier, this is still a tremendous amount of compute!
shrug
I think this is true to an extent, but a more systematic analysis needs to back this up.
For instance, I recall quantization techniques working much better after a certain scale (though I can’t seem to find the reference...). It also seems important to validate that techniques to increase performance apply at large scales. Finally, note that the frontier of scale is growing very fast, so even if these discoveries were done with relatively modest compute compared to the frontier, this is still a tremendous amount of compute!