I think the best argument for a “fastest-er” timeline would be that several of your bottlenecks end up heavily substituting against each other or some common factor. A researcher in NLP in 2015 might reasonably have guessed it’d take decades to reach the level of ChatGPT—after all, it would require breakthroughs in parsing, entailment, word sense, semantics, world knowledge… In reality these capabilities were all blessings of scale.
o1 may or may not be the central breakthrough in this scenario, but I can paint a world where it is, and that world is very fast indeed. RL, like next word prediction, is “mechanism agnostic”—it induces whatever capabilities are instrumental to maximizing reward. Already, “back-tracking” or “error-correcting” behavior has emerged, previously cited by LeCun and others as a fundamental block. In this world RL applied to “Chains of Action” strongly induces agentic structures like adaptability, tool use, note caching, goal-directedness, and coherence. Gradient descent successfully routes around any weaknesses in LLMs (as I suspect it does for weaknesses in our LLM training pipelines today). By the time we reach the same engineering effort in post-training that we’ve dedicated to pretraining, it’s AGI. Well, given the scale of OpenAI’s investment, we should be able to rule on this scenario pretty quickly.
I’d say I’m skeptical of the specifics (RL hasn’t demonstrated this kind of rich success in the past) but more uncertain about the outline (How well could e.g. fast adaptation trade off against multimodality, gullibility, coherence?)
I think the best argument for a “fastest-er” timeline would be that several of your bottlenecks end up heavily substituting against each other or some common factor. A researcher in NLP in 2015 might reasonably have guessed it’d take decades to reach the level of ChatGPT—after all, it would require breakthroughs in parsing, entailment, word sense, semantics, world knowledge… In reality these capabilities were all blessings of scale.
o1 may or may not be the central breakthrough in this scenario, but I can paint a world where it is, and that world is very fast indeed. RL, like next word prediction, is “mechanism agnostic”—it induces whatever capabilities are instrumental to maximizing reward. Already, “back-tracking” or “error-correcting” behavior has emerged, previously cited by LeCun and others as a fundamental block. In this world RL applied to “Chains of Action” strongly induces agentic structures like adaptability, tool use, note caching, goal-directedness, and coherence. Gradient descent successfully routes around any weaknesses in LLMs (as I suspect it does for weaknesses in our LLM training pipelines today). By the time we reach the same engineering effort in post-training that we’ve dedicated to pretraining, it’s AGI. Well, given the scale of OpenAI’s investment, we should be able to rule on this scenario pretty quickly.
I’d say I’m skeptical of the specifics (RL hasn’t demonstrated this kind of rich success in the past) but more uncertain about the outline (How well could e.g. fast adaptation trade off against multimodality, gullibility, coherence?)