So once an AI system trained end-to-end can produce similarly much value per token as a human researcher can produce per second, AI research will be more than fully automated. This means that, when AI first contributes more to AI research than humans do, the average research progress produced by 1 token of output will be significantly less than an average human AI researcher produces in a second of thinking[6]. Instead, the collective’s intelligence will largely come from a combination of things like:
Individual systems “thinking” for a long time, churning through many more explicit thoughts than a skilled human would need to solve a problem.[7]
Splitting up things in more granular subtasks, delegating them to other AI systems.
Generating huge numbers of possible solutions, and evaluating them all before picking one.
Most obviously, the token-by-token output of a single AI system should be quite easy for humans to supervise and monitor for danger. It will rarely contain any implicit cognitive leaps that a human couldn’t have generated themselves. (C.f. visible thoughts project and translucent thoughts hypothesis.)
I think the paper summarized in this twitter thread provides quite strong theoretical arguments in favor of these points.
I think the paper summarized in this twitter thread provides quite strong theoretical arguments in favor of these points.