[I mostly wrote this to clarify my thoughts. I’m unclear whether this will be valuable for readers. ]
I expect that within a decade, AI will be able to do 90% of current human jobs. I don’t mean that 90% of humans will be obsolete. I mean that the average worker could delegate 90% of their tasks to an AGI.
I feel confused about what this implies for the kind of AI long-term planning and strategizing that would enable an AI to create large-scale harm if it is poorly aligned.
Is the ability to achieve long-term goals hard for an AI to develop?
By long-term, I’m referring to goals that require both long time horizons, and some ability to forecast the results of multiple steps of interventions.
Evidence from Evolution
Evolution provides some evidence that it’s hard.
It seems uncommon for most species to do anything that requires planning more than a few days in advance. The main examples that I can find of multi-month planning seem sufficiently specialized that they likely involve instincts that can’t be adapted to novel tasks: beavers constructing dams, and squirrels caching food.
Human success suggests there’s value in a more general ability to do long-term planning. So there was likely some selective pressure for it. The time it took for evolution to find human levels of planning suggests that it’s relatively hard.
Human infants have the ability to develop long-term planning abilities. It seems like they would benefit from having those planning abilities at birth, yet they take years to develop. According to ChatGPT:
Early Childhood (3-6 years): As children begin to develop better memory and the ability to project themselves into the future, there’s a budding understanding of time. However, their grasp of longer time periods is still immature. They might understand “tomorrow” but struggle with the concept of “next week” or “next month.”
Middle Childhood (7-10 years): During this phase, children’s understanding of time becomes more sophisticated, and they start to develop the ability to delay gratification and think ahead. For instance, they might save money to buy a desired toy or understand the idea of studying now to do well on a test later. However, their ability to plan for the long-term (e.g., months or years ahead) remains limited.
This evidence suggests that AIs might require longer training times, or more diverse interactions with the world, than I’d expect to be practical within 10 years.
Obstacles to Planning
I asked ChatGPT what obstacles there are to developing AIs that are capable of long-term planning. It’s answers included Temporal Credit Assignment, Complexity of the Environment, Exploitation vs. Exploration Dilemma, and Feedback Delays.
I’ll frame my answer differently: it’s hard to develop casual models that are sufficiently general-purpose to handle a wide variety of scenarios.
Will AI be Different?
Much knowledge can be acquired by observing correlations in a large dataset. Current AI training focuses almost exclusively on this.
In contrast, human childhood involves some active interventions on the child’s environment. I expect that to provide better evidence for constructing causal models.
That means that scaling up LLMs to roughly human levels will leave AIs with relatively weak abilities at causal modeling, and therefore relatively weak planning abilities.
However, I don’t expect AI progress to be exclusively scaling up of LLMs. Robotics seems likely to become important. Robots will have training that causes them to develop more sophisticated causal models than a comparably smart LLM.
Will robots be a separate branch of AI, or will they be integrated with LLM knowledge? I expect at least some integration, if only to make them easy to instruct via natural languages. I’m unclear whether there will be strong incentives to keep updating robots with the most powerful LLM-type knowledge.
Will robots be trained to have good causal models of humans? I can imagine that the answer is no, due to the difficulty of modeling humans and the relative simplicity of designing manufacturing plants to be robot-only environments. I have rather low confidence in that forecast.
How general-purpose will robot’s causal models become by default?
Best AI Planning So Far?
I looked for good examples of long-term planning in AIs.
OpenAI’s Minecraft playing system seems relatively impressive. It achieved roughly human-level performance at crafting the diamond pickaxe. Human experts typically need 20 minutes and 24,000 actions to accomplish that.
But how much planning did the AI learn independently? Less than the summary implies. The task requires collecting 11 other items in sequence. It looks like they trained the AI with rewards for each item, so at any one stage of training it was only finding out how to collect one novel item in an otherwise familiar sequence.
It still sounds impressive that they were able to do that, but that’s probably not close to what I’d call long-term planning. This research would have benefited from longer-term planning. Their failure to produce it is another small piece of evidence that long-term planning is hard.
Another Minecraft system, Voyager, plays Minecraft by writing blocks of code for each the tasks it wanted to perform. When performing a task that is composed of several subtasks, it can just reuse the functions it has already written to perform those subtasks. I see some impressive search and composition here, but not much planning.
If I stretch my imagination, I can see some chance that this approach will someday lead to human-level or better planning. But for now, it feels like AIs are planning at the level of a two year old human, versus being closer to a four year old at other reasoning abilities. I expect that relative maturity to continue for a while.
LeCun’s JEPA Model
Yann LeCun has a strategy for developing human-level planning, outlined in A Path Towards Autonomous Machine Intelligence:
Humans and many animals are able to conceive multilevel abstractions with which long-term predictions and long-term planning can be performed by decomposing complex actions into sequences of lower-level ones.
The capacity of JEPA to learn abstractions suggests an extension of the architecture to handle prediction at multiple time scales and multiple levels of abstraction. Intuitively, low-level representations contain a lot of details about the input, and can be used to predict in the short term. But it may be difficult to produce accurate long-term predictions with the same level of details. Conversely high-level, abstract representation may enable long-term predictions, but at the cost of eliminating a lot of details.
LeCun may well have one of the best approaches to human level long-term planning. If so, his belief that human level AI is a long way away constitutes some sort of evidence that planning will be slow to develop.
Conclusion
This kind of analysis has unavoidable uncertainties. There might be some simple tricks that make AI planning work better than human planning. But this analysis seems to be the best I can do.
I’m leaning toward expecting a nontrivial period in which AIs have mostly human-level abilities, but are too short-sighted for rogue AIs to be a major problem.
So I expect the most serious AI risks to be a few years further away than what I’d expect if I were predicting based on IQ-style tests.
I have moderate hopes for a period of more than a year in which AI assistants can contribute a fair amount to speeding up safety research.
In Minsky’s “Steps Towards Artificial Intelligence”, Planning is the second-last stage. The final stage is Induction, by which he means, making its own models of the world.
As far as the current era of AI goes, you could say we saw the first signs of Planning in the primitive LLM-based agent ChaosGPT. It wasn’t very good at planning, but it did talk to itself about which courses of action to take.
Apart from the method of adding planning “scaffolding” to a transformer LLM, there is the rumor that Google’s Gemini combines the Monte Carlo Tree Search method of policy optimization, used in AlphaGo, with a transformer-like architecture.
I think next year’s AIs will probably be good at planning, and I’ll stick with my timeline, 0-5 years to superintelligence.
I think Minsky got those two stages the wrong way around.
Complex plans over long time horizons would need to be done over some nontrivial world model.
At this point I think the general shape of brain-inspired algorithms for efficient model-based planning are fairly obvious but they translate into a use of large (ie TBs) of ‘fast weight’ memory at different timescales (mostly in prefrontal cortex, BG, hippocampus-adjacent and associated) combined with true recurrence, which currently seems prohibitively expensive to translate directly into transformers on GPUs (fast weights are equivalent to KV cache unique per experience sequence and thus expensive for inference). Further speculation on how to improve that probably shouldn’t be discussed in this public forum.
I think:
long term planning is hard but maybe not super hard
if your training uses short term feedback—which typically is all anyone who isn’t an evolution has time for, and even evolved systems need to use a lot of—then there’s usually some simpler solution than long term planning to satisfy that short term feedback, which means the system doesn’t typically reach the long term planning solution using gradient descent based on the short term feedback
under recursive self-improvement, a long-term-planner will tend to preserve its long-term-planning nature, while a non-long-term-planner will not care, making long-term planning an attractor state under recursive self-improvement
sufficiently advanced metacognition might be equivalent to recursive self-improvement
I realize now that some of this post was influenced by a post that I’d forgotten reading: Causal confusion as an argument against the scaling hypothesis, which does a better job of explaining what I meant by causal modeling being hard.