Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:
A note is that I actually expect AI progress to slow down for at least a year, and potentially up to 4-5 years due to the tariffs inducing a recession, but this doesn’t matter for the debate on whether LLMs can get to AGI.
I agree with the view that recurrence/hidden states would be a game-changer if they worked, because it allows the LLM to have a memory, and memoryless humans are way, way less employable than people who have memory, because it’s much easier to meta-learn strategies with memory.
That said, I’m both uncertain on the view that recurrence is necessary to get LLMs to learn better/have a memory/state that lasts beyond the context window, and also think that meta-learning over long periods/having a memory is probably the only hard bottleneck at this point that might not be solved (but is likely to be solved, if these new papers are anything to go by).
I basically agree with @gwern’s explanation of what LLMs are missing that makes them not AGIs (at least without a further couple of OOMs at the very least, and the worst case is they need exponential compute to get linear gains):
I only think one intervention is basically necessary at most, and one could argue that 0 new insights are needed.
The other part here is I basically disagree with this assumption, and more generally I have a strong prior that a lot of problems are solved by muddling through/using semi-dumb strategies that work way better than they have any right to:
I also depart from certain other details latter, for instance I think we’ll have better theory by the time we need to align human level AI and “muddling through” by blind experimentation probably won’t work or be the actual path taken by surviving worlds.
I think most worlds that survive AGI to ASI for at least 2 years, if not longer, will almost certainly include a lot of dropped balls and fairly blind experimentation (helped out by the AI control agenda), as well as the world’s offense-defense balance shifting to a more defensive equilibrium.
I do think most of my probability mass for AI that can automate all AI research is in the 2030s, but this is broadly due to the tariffs and scaling up new innovations taking some time, rather than the difficulty of AGI being high.
Edit: @Vladimir_Nesov has convinced me that the tariffs delay stuff only slightly, though my issue is with the tariffs causing an economic recession, causing AI investment to fall quite a bit for a while.
probability mass for AI that can automate all AI research is in the 2030s … broadly due to the tariffs and …
Without AGI, scaling of hardware runs into the financial ~$200bn individual training system cost wall in 2027-2029. Any tribulations on the way (or conversely efforts to pool heterogeneous and geographically distributed compute) only delay that point slightly (when compared to the current pace of increase in funding), and you end up in approximately the same place, slowing down to the speed of advancement in FLOP/s per watt (or per dollar). Without transformative AI, anything close to the current pace is unlikely to last into the 2030s.
Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:
https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
https://arxiv.org/abs/2407.04620
https://x.com/karansdalal/status/1810377853105828092 (This is the tweet I pointed to for the claim that they solved the issue of training vanilla RNNs):
https://x.com/karansdalal/status/1909312851795411093 (Previous work that is relevant)
https://x.com/karansdalal/status/1909312851795411093 (Tweet of the current paper)
A note is that I actually expect AI progress to slow down for at least a year, and potentially up to 4-5 years due to the tariffs inducing a recession, but this doesn’t matter for the debate on whether LLMs can get to AGI.
I agree with the view that recurrence/hidden states would be a game-changer if they worked, because it allows the LLM to have a memory, and memoryless humans are way, way less employable than people who have memory, because it’s much easier to meta-learn strategies with memory.
That said, I’m both uncertain on the view that recurrence is necessary to get LLMs to learn better/have a memory/state that lasts beyond the context window, and also think that meta-learning over long periods/having a memory is probably the only hard bottleneck at this point that might not be solved (but is likely to be solved, if these new papers are anything to go by).
I basically agree with @gwern’s explanation of what LLMs are missing that makes them not AGIs (at least without a further couple of OOMs at the very least, and the worst case is they need exponential compute to get linear gains):
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/?commentId=hSkQG2N8rkKXosLEF
I only think one intervention is basically necessary at most, and one could argue that 0 new insights are needed.
The other part here is I basically disagree with this assumption, and more generally I have a strong prior that a lot of problems are solved by muddling through/using semi-dumb strategies that work way better than they have any right to:
I think most worlds that survive AGI to ASI for at least 2 years, if not longer, will almost certainly include a lot of dropped balls and fairly blind experimentation (helped out by the AI control agenda), as well as the world’s offense-defense balance shifting to a more defensive equilibrium.
I do think most of my probability mass for AI that can automate all AI research is in the 2030s, but this is broadly due to the tariffs and scaling up new innovations taking some time, rather than the difficulty of AGI being high.
Edit: @Vladimir_Nesov has convinced me that the tariffs delay stuff only slightly, though my issue is with the tariffs causing an economic recession, causing AI investment to fall quite a bit for a while.
Without AGI, scaling of hardware runs into the financial ~$200bn individual training system cost wall in 2027-2029. Any tribulations on the way (or conversely efforts to pool heterogeneous and geographically distributed compute) only delay that point slightly (when compared to the current pace of increase in funding), and you end up in approximately the same place, slowing down to the speed of advancement in FLOP/s per watt (or per dollar). Without transformative AI, anything close to the current pace is unlikely to last into the 2030s.