I don’t think a doubling every 4 or 6 months is plausible. I don’t think a doubling on any fixed time is plausible because I don’t think overall progress will be exponential. I think you could have exponential progress on thought generation, but this won’t yield exponential progress on performance. That’s what I was trying to get at with this paragraph:
My hot take is that the graphics I opened the post with were basically correct in modeling thought generation. Perhaps you could argue that progress wasn’t quite as fast as the most extreme versions predicted, but LLMs did go from subhuman to superhuman thought generation in a few years, so that’s pretty fast. But intelligence isn’t a singular capability; it’s two capabilities a phenomenon better modeled as two capabilities, and increasing just one of them happens to have sub-linear returns on overall performance.
So far (as measured by the 7card puzzle, which It think is a fair data point) I think we went from ‘no sequential reasoning whatsoever’ to ‘attempted sequential reasoning but basically failed’ (Jun13 update) to now being able to do genuine sequential reasoning for the first time. And if you look at how DeepSeek does it, to me this looks like the kind of thing where I expect difficulty to grow exponentially with argument length. (Based on stuff like it constantly having to go back and double checking even when it got something right.)
What I’d expect from this is not a doubling every N months, but perhaps an ability to reliably do one more step every N months. I think this translates into more above-constant returns on the “horizon length” scale—because I think humans need more than 2x time for 2x steps—but not exponential returns.
I expect difficulty to grow exponentially with argument length. (Based on stuff like it constantly having to go back and double checking even when it got something right.)
Training of DeepSeek-R1 doesn’t seem to do anything at all to incentivize shorter reasoning traces, so it’s just rechecking again and again because why not. Like if you are taking an important 3 hour written test, and you are done in 1 hour, it’s prudent to spend the remaining 2 hours obsessively verifying everything.
I don’t think a doubling every 4 or 6 months is plausible. I don’t think a doubling on any fixed time is plausible because I don’t think overall progress will be exponential. I think you could have exponential progress on thought generation, but this won’t yield exponential progress on performance. That’s what I was trying to get at with this paragraph:
So far (as measured by the 7card puzzle, which It think is a fair data point) I think we went from ‘no sequential reasoning whatsoever’ to ‘attempted sequential reasoning but basically failed’ (Jun13 update) to now being able to do genuine sequential reasoning for the first time. And if you look at how DeepSeek does it, to me this looks like the kind of thing where I expect difficulty to grow exponentially with argument length. (Based on stuff like it constantly having to go back and double checking even when it got something right.)
What I’d expect from this is not a doubling every N months, but perhaps an ability to reliably do one more step every N months. I think this translates into more above-constant returns on the “horizon length” scale—because I think humans need more than 2x time for 2x steps—but not exponential returns.
Training of DeepSeek-R1 doesn’t seem to do anything at all to incentivize shorter reasoning traces, so it’s just rechecking again and again because why not. Like if you are taking an important 3 hour written test, and you are done in 1 hour, it’s prudent to spend the remaining 2 hours obsessively verifying everything.