I think that is a problem for the industry, but probably not an insurmountable barrier the way some commentators make it out to be.
o-series of models may be able to produce new high quality training data
sufficiently good reasoning approaches + existing base models + scaffolding may be sufficient to get you to automating ML research
One other thought is that there’s probably an upper limit on how good an LLM can get even with unlimited high quality data and I’d guess that models would asymptotically approach it for a while. Based on the reporting around GPT-5 and other next-gen models, I’d guess that the issue is lack of data rather than approaching some fundamental limit.
Fixed, thanks!