There is little reason to think that’s a big issue. A lot of data is semi-tagged, some of the ML-generated data can be removed either that way or by being detected by newer models. And in general as long as the ‘good’ type of data is also increasing model quality will also keep increasing even if you have some extra noise.
There is little reason to think that’s a big issue. A lot of data is semi-tagged, some of the ML-generated data can be removed either that way or by being detected by newer models. And in general as long as the ‘good’ type of data is also increasing model quality will also keep increasing even if you have some extra noise.