Is this the consensus view? I think it’s generally agreed that software development has been sped up. A factor of two is ambitious! But that’s what it seems to me, and I’ve measured three examples of computer vision programming, each taking an hour or two, by doing them by hand and then with machine assistance. The machines are dumb and produce results that require rewriting. But my code is also inaccurate on a first try. I don’t have any references where people agree with me. And this may not apply to AI programming in general.
You ask about “anonymous reports of diminishing returns to scaling.” I have also heard these reports, direct from a friend who is a researcher inside a major lab. But note that this does not imply a diminished rate of progress, since there are other ways to advance besides making LLMs bigger. O1 and o3 indicate the payoffs to be had by doing things other than pure scaling. If there are forms of progress available to cleverness, then the speed of advance need not require scaling.
The lack of reliability eats away a huge amount of productivity. Everything should be double-checked, and with higher capabilities it becomes even harder, and we need to think more about the subtle ways that their output is wrong. Unknown unknowns are also always a factor, but if o3 type models can be trained in less verifiable problems, and not insanely compute heavy, then 2026 is actually a reasonable guess.
Is this the consensus view? I think it’s generally agreed that software development has been sped up. A factor of two is ambitious! But that’s what it seems to me, and I’ve measured three examples of computer vision programming, each taking an hour or two, by doing them by hand and then with machine assistance. The machines are dumb and produce results that require rewriting. But my code is also inaccurate on a first try. I don’t have any references where people agree with me. And this may not apply to AI programming in general.
You ask about “anonymous reports of diminishing returns to scaling.” I have also heard these reports, direct from a friend who is a researcher inside a major lab. But note that this does not imply a diminished rate of progress, since there are other ways to advance besides making LLMs bigger. O1 and o3 indicate the payoffs to be had by doing things other than pure scaling. If there are forms of progress available to cleverness, then the speed of advance need not require scaling.
The lack of reliability eats away a huge amount of productivity. Everything should be double-checked, and with higher capabilities it becomes even harder, and we need to think more about the subtle ways that their output is wrong. Unknown unknowns are also always a factor, but if o3 type models can be trained in less verifiable problems, and not insanely compute heavy, then 2026 is actually a reasonable guess.