If it takes a human 1 month to solve a difficult problem, it seems unlikely that a less capable human who can’t solve it within 20 years of effort can still succeed in 40 years
I suspect that your intuition about human beings is misled because in humans “stick-to-it-ness” and “intelligence” (g-factor) are strongly positively correlated. That is, in almost all cases of human genius, the best-of-the-best have both very high IQ and have spent a long time thinking about the problem they are interested in. In fact, inference compute is probably more important among human geniuses, since it is unlikely that (in terms of raw flops) even the smartest human is as much as 2x above the average (since human brains are all roughly the same size).
Human reasoning that I’m comparing with is also using long reasoning traces, so unlocking the capability is part of the premise (many kinds of test time compute parallelize, but not in this case, so the analogy is more narrow than test time compute in general). The question is how much you can get from 3 orders of magnitude longer reasoning traces beyond the first additional 3 orders of magnitude, while thinking at a quality below that of the reference human. Current o1-like post-training doesn’t yet promise that scaling goes that far (it won’t even fit in a context, who knows if the scaling continues after workarounds for this are in place).
Human experience suggests to me that in humans scaling doesn’t go that far either. When a problem can be effectively reduced to simpler problems, then it wasn’t as difficult after all. And so the ratchet of science advances, at a linear and not logarithmic speed, within the bounds of human-feasible difficulty. The 300x of excess speed is a lot to overcome for a slowdown due to orders of magnitude longer reasoning traces than feasible in human experience, for a single problem that resists more modular analysis.
Human experience suggests to me that in humans scaling doesn’t go that far either.
For biological reasons, humans do not think about problems for 1000′s of years. A human who gives a problem a good 2-hour think is within <3 OOM of a human who spends their entire career working on a single problem.
AI researchers have found that it is possible to trade inference compute for training compute across a wide variety of domains including: image generation, robotic control, game playing, computer programming and solving math problems.
I suspect that your intuition about human beings is misled because in humans “stick-to-it-ness” and “intelligence” (g-factor) are strongly positively correlated. That is, in almost all cases of human genius, the best-of-the-best have both very high IQ and have spent a long time thinking about the problem they are interested in. In fact, inference compute is probably more important among human geniuses, since it is unlikely that (in terms of raw flops) even the smartest human is as much as 2x above the average (since human brains are all roughly the same size).
Human reasoning that I’m comparing with is also using long reasoning traces, so unlocking the capability is part of the premise (many kinds of test time compute parallelize, but not in this case, so the analogy is more narrow than test time compute in general). The question is how much you can get from 3 orders of magnitude longer reasoning traces beyond the first additional 3 orders of magnitude, while thinking at a quality below that of the reference human. Current o1-like post-training doesn’t yet promise that scaling goes that far (it won’t even fit in a context, who knows if the scaling continues after workarounds for this are in place).
Human experience suggests to me that in humans scaling doesn’t go that far either. When a problem can be effectively reduced to simpler problems, then it wasn’t as difficult after all. And so the ratchet of science advances, at a linear and not logarithmic speed, within the bounds of human-feasible difficulty. The 300x of excess speed is a lot to overcome for a slowdown due to orders of magnitude longer reasoning traces than feasible in human experience, for a single problem that resists more modular analysis.
For biological reasons, humans do not think about problems for 1000′s of years. A human who gives a problem a good 2-hour think is within <3 OOM of a human who spends their entire career working on a single problem.