Yes. However, before the idea was ‘scaling is powerful/smart’. It is, but doing things this other way, is apparently more powerful. So if you want powerful models, grab a gallon of compute and a gallon of data, instead of two gallons of compute.*
This is probably a bad analogy, because at some point you’re going to want to a) increase the stuff you were leaving the same. in your recipe, and then later, b) mix stuff up in smaller batches than ‘all the ingredients’.
Yes. However, before the idea was ‘scaling is powerful/smart’. It is, but doing things this other way, is apparently more powerful. So if you want powerful models, grab a gallon of compute and a gallon of data, instead of two gallons of compute.*
This is probably a bad analogy, because at some point you’re going to want to a) increase the stuff you were leaving the same. in your recipe, and then later, b) mix stuff up in smaller batches than ‘all the ingredients’.