Human reasoning that I’m comparing with is also using long reasoning traces, so unlocking the capability is part of the premise (many kinds of test time compute parallelize, but not in this case, so the analogy is more narrow than test time compute in general). The question is how much you can get from 3 orders of magnitude longer reasoning traces beyond the first additional 3 orders of magnitude, while thinking at a quality below that of the reference human. Current o1-like post-training doesn’t yet promise that scaling goes that far (it won’t even fit in a context, who knows if the scaling continues after workarounds for this are in place).
Human experience suggests to me that in humans scaling doesn’t go that far either. When a problem can be effectively reduced to simpler problems, then it wasn’t as difficult after all. And so the ratchet of science advances, at a linear and not logarithmic speed, within the bounds of human-feasible difficulty. The 300x of excess speed is a lot to overcome for a slowdown due to orders of magnitude longer reasoning traces than feasible in human experience, for a single problem that resists more modular analysis.
Human experience suggests to me that in humans scaling doesn’t go that far either.
For biological reasons, humans do not think about problems for 1000′s of years. A human who gives a problem a good 2-hour think is within <3 OOM of a human who spends their entire career working on a single problem.
Human reasoning that I’m comparing with is also using long reasoning traces, so unlocking the capability is part of the premise (many kinds of test time compute parallelize, but not in this case, so the analogy is more narrow than test time compute in general). The question is how much you can get from 3 orders of magnitude longer reasoning traces beyond the first additional 3 orders of magnitude, while thinking at a quality below that of the reference human. Current o1-like post-training doesn’t yet promise that scaling goes that far (it won’t even fit in a context, who knows if the scaling continues after workarounds for this are in place).
Human experience suggests to me that in humans scaling doesn’t go that far either. When a problem can be effectively reduced to simpler problems, then it wasn’t as difficult after all. And so the ratchet of science advances, at a linear and not logarithmic speed, within the bounds of human-feasible difficulty. The 300x of excess speed is a lot to overcome for a slowdown due to orders of magnitude longer reasoning traces than feasible in human experience, for a single problem that resists more modular analysis.
For biological reasons, humans do not think about problems for 1000′s of years. A human who gives a problem a good 2-hour think is within <3 OOM of a human who spends their entire career working on a single problem.