I would presume that the process of the AI improvement can be also modelled as:
A.) Coming up with good research ideas.
B.) Finding the precise formulation of that idea that makes most sense/works.
C.) Implementation of the idea.
If you claim that C) only “takes hours”—then with the AI Coder it takes seconds instead (nowadays agents work correctly only 50-70% of the time, hence a programmer indeed has to spent these couple of hours).
Then the loop becomes tighter—a single iteration takes a few hours less.
Let’s assume there’s a very creative engineer who can come up with a couple ideas a day.
What is the B-step? Finding the formulation means e.g. getting the math equations, right? The LLMs become superhuman at math this year already. If they’re superhuman then the loop becomes tighter.
Then instead of spending a day on an idea (a few hours of implementation), you test a bunch of them a day.
Also—the A) can probably get automated too, with a framework in which you make the model read all the literature and provide combinations of ideas which you then filter out. Each new model makes the propositions more relevant.
So all 3 steps get semi-automated (and gradually tighten with next models releases), where the human’s role boils down to filtering things out—it’s the “taste” quality, which Kokotajlo mentions.
Tobiasz B
Karma: 0
I understand these two paths simply as:
- a scenario of aligned AI
- a scenario of not aligned AI
The aligned AI by definition is a machine whose values (~will) is similar to the values of humans.
If this is the case, then if people want something, then the AI wants it too. If people want to be agentic, then they are agentic—because the AI wants it and allows them for that.
In the second scenario people become irrelevant. They get wiped out. The machine then proceeds with the realisation of its desires. The desires are what people had injected in it. In this prediction the desires and values are:
- scientific/AI research—coming from the agency properties (LLM in a for loop?)
- making impression of somebody friendly—coming from the RLHF-like techniques in which the output of the LLM has to be accepted by various people and people-made criteria.