I think the labs might well be rational in focusing on this sort of “handheld automation”, just to enable their researchers to code experiments faster and in smaller teams.
My mental model of AI R&D is that it can be bottlenecked roughly by three things: compute, engineering time, and the “dark matter” of taste and feedback loops on messy research results. I can certainly imagine a model of lab productivity where the best way to accelerate is improving handheld automation for the entirety of 2025. Say, the core paradigm is fixed; but inside that paradigm, the research team has more promising ideas than they have time to implement and try out on smaller-scale experiments; and they really do not want to hire more people.
If you consider the AI lab as a fundamental unit that wants to increase its velocity, and works on things that make models faster, it’s plausible they can be aware how bad the model performance is on research taste, and still not be making a mistake by ignoring your “dark matter” right now. They will work on it when they are faster.
GPT-4o’s drawings of itself as a person are remarkably consistent: it’s more or less always a similar-looking white male in his late 20s with brown hair, often sporting facial hair and glasses, unless you specify otherwise. All the men it generates might as well be brothers. I reproduced this on two ChatGPT accounts with clean memory.
On the contrary, its drawings of itself when it does not depict itself as a person are far more diverse: a wide range of robot designs and abstract humanoids, often featuring OpenAI logo as a head or on the word “GPT” on the chest.