Another reason for thinking that LLM AGI will have memory/state, conditional on AGI being built, is that it’s probably the only blocker to something like drop-in remote workers being built, and from there escalating to AGI and ASI because it would allow for potentially unbounded meta-learning given unbounded resources, and even make meta-learning in general far more effective for longer time periods.
Gwern explains why meta-learning explains basically all of the baffling LLM weaknesses here, and the short version is that right now, LLM weights are frozen after training and they have zero neuroplasticity after training (modulo in-context learning, but that is way too weak to matter), and this means LLMs can learn 0 new tricks after release, and in all but the simplest tasks, it turns out that learning has to be continuously there, which was the key thing we didn’t really realize was a limitation of GPT-N style AIs.
More in the comment below:
Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:
https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
https://arxiv.org/abs/2407.04620
https://x.com/karansdalal/status/1810377853105828092 (This is the tweet I pointed to for the claim that they solved the issue of training vanilla RNNs):
https://x.com/karansdalal/status/1909312851795411093 (Previous work that is relevant)
https://x.com/karansdalal/status/1909312851795411093 (Tweet of the current paper)
A note is that I actually expect AI progress to slow down for at least a year, and potentially up to 4-5 years due to the tariffs inducing a recession, but this doesn’t matter for the debate on whether LLMs can get to AGI.
I agree with the view that recurrence/hidden states would be a game-changer if they worked, because it allows the LLM to have a memory, and memoryless humans are way, way less employable than people who have memory, because it’s much easier to meta-learn strategies with memory.
That said, I’m both uncertain on the view that recurrence is necessary to get LLMs to learn better/have a memory/state that lasts beyond the context window, and also think that meta-learning over long periods/having a memory is probably the only hard bottleneck at this point that might not be solved (but is likely to be solved, if these new papers are anything to go by).
I basically agree with @gwern’s explanation of what LLMs are missing that makes them not AGIs (at least without a further couple of OOMs at the very least, and the worst case is they need exponential compute to get linear gains):
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF
I only think one intervention is basically necessary at most, and one could argue that 0 new insights are needed.
The other part here is I basically disagree with this assumption, and more generally I have a strong prior that a lot of problems are solved by muddling through/using semi-dumb strategies that work way better than they have any right to:
I think most worlds that survive AGI to ASI for at least 2 years, if not longer, will almost certainly include a lot of dropped balls and fairly blind experimentation (helped out by the AI control agenda), as well as the world’s offense-defense balance shifting to a more defensive equilibrium.
I do think most of my probability mass for AI that can automate all AI research is in the 2030s, but this is broadly due to the tariffs and scaling up new innovations taking some time, rather than the difficulty of AGI being high.
Edit: @Vladimir_Nesov has convinced me that the tariffs delay stuff only slightly, though my issue is with the tariffs causing an economic recession, causing AI investment to fall quite a bit for a while.