I’m a bit confused by what you mean by “LLMs will not scale to AGI” in combination with “a single innovation is all that is needed for AGI”.
E.g., consider the following scenarios:
AGI (in the sense you mean) is achieved by figuring out a somewhat better RL scheme and massively scaling this up on GPT-6.
AGI is achieved by doing some sort of architectural hack on top of GPT-6 which makes it able to reason in neuralese for longer and then doing a bunch of training to teach the model to use this well.
AGI is achieved via doing some sort of iterative RL/synth data/self-improvement process for GPT-6 in which GPT-6 generates vast amounts of synthetic data for itself using various tools.
IMO, these sound very similar to “LLMs scale to AGI” for many practical purposes:
LLM scaling is required for AGI
LLM scaling drives the innovation required for AGI
From the public’s perspective, it maybe just looks like AI is driven by LLMs getting better over time and various tweaks might be continuously introduced.
Maybe it is really key in your view that the single innovation is really discontinuous and maybe the single innovation doesn’t really require LLM scaling.
I’m a bit confused by what you mean by “LLMs will not scale to AGI” in combination with “a single innovation is all that is needed for AGI”.
E.g., consider the following scenarios:
AGI (in the sense you mean) is achieved by figuring out a somewhat better RL scheme and massively scaling this up on GPT-6.
AGI is achieved by doing some sort of architectural hack on top of GPT-6 which makes it able to reason in neuralese for longer and then doing a bunch of training to teach the model to use this well.
AGI is achieved via doing some sort of iterative RL/synth data/self-improvement process for GPT-6 in which GPT-6 generates vast amounts of synthetic data for itself using various tools.
IMO, these sound very similar to “LLMs scale to AGI” for many practical purposes:
LLM scaling is required for AGI
LLM scaling drives the innovation required for AGI
From the public’s perspective, it maybe just looks like AI is driven by LLMs getting better over time and various tweaks might be continuously introduced.
Maybe it is really key in your view that the single innovation is really discontinuous and maybe the single innovation doesn’t really require LLM scaling.