The fact that RL seems to be working well on LLMs now, without special tricks, as reported by many replications of r1, suggests to me that AGI is indeed not far off.
Still, at least as long as base model effective training compute isn’t scaled another 1,000x (which is 2028-2029), this kind of RL training probably won’t generalize far enough without neural (LLM) rewards, which for now don’t let RL scale as much as with explicitly coded verifiers.
Still, at least as long as base model effective training compute isn’t scaled another 1,000x (which is 2028-2029), this kind of RL training probably won’t generalize far enough without neural (LLM) rewards, which for now don’t let RL scale as much as with explicitly coded verifiers.