So once an AI system trained end-to-end can produce similarly much value per token as a human researcher can produce per second, AI research will be more than fully automated. This means that, when AI first contributes more to AI research than humans do, the average research progress produced by 1 token of output will be significantly less than an average human AI researcher produces in a second of thinking.
Here’s one piece of (weak) evidence from the current SOTA on swebench:
’Median token usage per patch: 2.6 million tokens
90th percentile token usage: 11.82 million tokens’