[I mostly wrote this to clarify my thoughts. I’m unclear whether this
will be valuable for readers. ]
I expect that within a decade, AI will be able to do 90% of current
human jobs. I don’t mean that 90% of humans will be obsolete. I mean
that the average worker could delegate 90% of their tasks to an AGI.
I feel confused about what this implies for the kind of AI long-term
planning and strategizing that would enable an AI to create large-scale
harm if it is poorly aligned.
Is the ability to achieve long-term goals hard for an AI to develop?
By long-term, I’m referring to goals that require both long time
horizons, and some ability to forecast the results of multiple steps of
interventions.
Evidence from Evolution
Evolution provides some evidence that it’s hard.
It seems uncommon for most species to do anything that requires planning
more than a few days in advance. The main examples that I can find of
multi-month planning seem sufficiently specialized that they likely
involve instincts that can’t be adapted to novel tasks: beavers
constructing dams, and squirrels caching food.
Human success suggests there’s value in a more general ability to do
long-term planning. So there was likely some selective pressure for it.
The time it took for evolution to find human levels of planning suggests
that it’s relatively hard.
Human infants have the ability to develop long-term planning abilities.
It seems like they would benefit from having those planning abilities at
birth, yet they take years to develop. According to ChatGPT:
Early Childhood (3-6 years): As children begin to develop better
memory and the ability to project themselves into the future, there’s
a budding understanding of time. However, their grasp of longer time
periods is still immature. They might understand “tomorrow” but
struggle with the concept of “next week” or “next month.”
Middle Childhood (7-10 years): During this phase, children’s
understanding of time becomes more sophisticated, and they start to
develop the ability to delay gratification and think ahead. For
instance, they might save money to buy a desired toy or understand the
idea of studying now to do well on a test later. However, their
ability to plan for the long-term (e.g., months or years ahead)
remains limited.
This evidence suggests that AIs might require longer training times, or
more diverse interactions with the world, than I’d expect to be
practical within 10 years.
Obstacles to Planning
I asked ChatGPT what obstacles there are to developing AIs that are
capable of long-term planning. It’s answers included Temporal Credit
Assignment, Complexity of the Environment, Exploitation vs. Exploration
Dilemma, and Feedback Delays.
I’ll frame my answer differently: it’s hard to develop casual models
that are sufficiently general-purpose to handle a wide variety of
scenarios.
Will AI be Different?
Much knowledge can be acquired by observing correlations in a large
dataset. Current AI training focuses almost exclusively on this.
In contrast, human childhood involves some active interventions on the
child’s environment. I expect that to provide better evidence for
constructing causal models.
That means that scaling up LLMs to roughly human levels will leave AIs
with relatively weak abilities at causal modeling, and therefore
relatively weak planning abilities.
However, I don’t expect AI progress to be exclusively scaling up of
LLMs. Robotics seems likely to become important. Robots will have
training that causes them to develop more sophisticated causal models
than a comparably smart LLM.
Will robots be a separate branch of AI, or will they be integrated with
LLM knowledge? I expect at least some integration, if only to make them
easy to instruct via natural languages. I’m unclear whether there will
be strong incentives to keep updating robots with the most powerful
LLM-type knowledge.
Will robots be trained to have good causal models of humans? I can
imagine that the answer is no, due to the difficulty of modeling humans
and the relative simplicity of designing manufacturing plants to be
robot-only environments. I have rather low confidence in that forecast.
How general-purpose will robot’s causal models become by default?
Best AI Planning So Far?
I looked for good examples of long-term planning in AIs.
OpenAI’s Minecraft playing system
seems relatively impressive. It achieved roughly human-level performance
at crafting the diamond pickaxe. Human experts typically need 20 minutes
and 24,000 actions to accomplish that.
But how much planning did the AI learn independently? Less than the
summary implies. The task requires collecting 11 other items in
sequence. It looks like they trained the AI with rewards for each item,
so at any one stage of training it was only finding out how to collect
one novel item in an otherwise familiar sequence.
It still sounds impressive that they were able to do that, but that’s
probably not close to what I’d call long-term planning. This research
would have benefited from longer-term planning. Their failure to produce
it is another small piece of evidence that long-term planning is hard.
Another Minecraft system, Voyager,
plays Minecraft by writing blocks of code for each the tasks it wanted
to perform. When performing a task that is composed of several subtasks,
it can just reuse the functions it has already written to perform those
subtasks. I see some impressive search and composition here, but not
much planning.
If I stretch my imagination, I can see some chance that this approach
will someday lead to human-level or better planning. But for now, it
feels like AIs are planning at the level of a two year old human, versus
being closer to a four year old at other reasoning abilities. I expect
that relative maturity to continue for a while.
Humans and many animals are able to conceive multilevel abstractions
with which long-term predictions and long-term planning can be
performed by decomposing complex actions into sequences of lower-level
ones.
The capacity of JEPA to learn abstractions suggests an extension of
the architecture to handle prediction at multiple time scales and
multiple levels of abstraction. Intuitively, low-level representations
contain a lot of details about the input, and can be used to predict
in the short term. But it may be difficult to produce accurate
long-term predictions with the same level of details. Conversely
high-level, abstract representation may enable long-term predictions,
but at the cost of eliminating a lot of details.
LeCun may well have one of the best approaches to human level long-term
planning. If so, his belief that human level AI is a long way away
constitutes some sort of evidence that planning will be slow to develop.
Conclusion
This kind of analysis has unavoidable uncertainties. There might be some
simple tricks that make AI planning work better than human planning. But
this analysis seems to be the best I can do.
I’m leaning toward expecting a nontrivial period in which AIs have
mostly human-level abilities, but are too short-sighted for rogue AIs to
be a major problem.
So I expect the most serious AI risks to be a few years further away
than what I’d expect if I were predicting based on IQ-style tests.
I have moderate hopes for a period of more than a year in which AI
assistants can contribute a fair amount to speeding up safety research.
When Will AIs Develop Long-Term Planning?
Link post
[I mostly wrote this to clarify my thoughts. I’m unclear whether this will be valuable for readers. ]
I expect that within a decade, AI will be able to do 90% of current human jobs. I don’t mean that 90% of humans will be obsolete. I mean that the average worker could delegate 90% of their tasks to an AGI.
I feel confused about what this implies for the kind of AI long-term planning and strategizing that would enable an AI to create large-scale harm if it is poorly aligned.
Is the ability to achieve long-term goals hard for an AI to develop?
By long-term, I’m referring to goals that require both long time horizons, and some ability to forecast the results of multiple steps of interventions.
Evidence from Evolution
Evolution provides some evidence that it’s hard.
It seems uncommon for most species to do anything that requires planning more than a few days in advance. The main examples that I can find of multi-month planning seem sufficiently specialized that they likely involve instincts that can’t be adapted to novel tasks: beavers constructing dams, and squirrels caching food.
Human success suggests there’s value in a more general ability to do long-term planning. So there was likely some selective pressure for it. The time it took for evolution to find human levels of planning suggests that it’s relatively hard.
Human infants have the ability to develop long-term planning abilities. It seems like they would benefit from having those planning abilities at birth, yet they take years to develop. According to ChatGPT:
This evidence suggests that AIs might require longer training times, or more diverse interactions with the world, than I’d expect to be practical within 10 years.
Obstacles to Planning
I asked ChatGPT what obstacles there are to developing AIs that are capable of long-term planning. It’s answers included Temporal Credit Assignment, Complexity of the Environment, Exploitation vs. Exploration Dilemma, and Feedback Delays.
I’ll frame my answer differently: it’s hard to develop casual models that are sufficiently general-purpose to handle a wide variety of scenarios.
Will AI be Different?
Much knowledge can be acquired by observing correlations in a large dataset. Current AI training focuses almost exclusively on this.
In contrast, human childhood involves some active interventions on the child’s environment. I expect that to provide better evidence for constructing causal models.
That means that scaling up LLMs to roughly human levels will leave AIs with relatively weak abilities at causal modeling, and therefore relatively weak planning abilities.
However, I don’t expect AI progress to be exclusively scaling up of LLMs. Robotics seems likely to become important. Robots will have training that causes them to develop more sophisticated causal models than a comparably smart LLM.
Will robots be a separate branch of AI, or will they be integrated with LLM knowledge? I expect at least some integration, if only to make them easy to instruct via natural languages. I’m unclear whether there will be strong incentives to keep updating robots with the most powerful LLM-type knowledge.
Will robots be trained to have good causal models of humans? I can imagine that the answer is no, due to the difficulty of modeling humans and the relative simplicity of designing manufacturing plants to be robot-only environments. I have rather low confidence in that forecast.
How general-purpose will robot’s causal models become by default?
Best AI Planning So Far?
I looked for good examples of long-term planning in AIs.
OpenAI’s Minecraft playing system seems relatively impressive. It achieved roughly human-level performance at crafting the diamond pickaxe. Human experts typically need 20 minutes and 24,000 actions to accomplish that.
But how much planning did the AI learn independently? Less than the summary implies. The task requires collecting 11 other items in sequence. It looks like they trained the AI with rewards for each item, so at any one stage of training it was only finding out how to collect one novel item in an otherwise familiar sequence.
It still sounds impressive that they were able to do that, but that’s probably not close to what I’d call long-term planning. This research would have benefited from longer-term planning. Their failure to produce it is another small piece of evidence that long-term planning is hard.
Another Minecraft system, Voyager, plays Minecraft by writing blocks of code for each the tasks it wanted to perform. When performing a task that is composed of several subtasks, it can just reuse the functions it has already written to perform those subtasks. I see some impressive search and composition here, but not much planning.
If I stretch my imagination, I can see some chance that this approach will someday lead to human-level or better planning. But for now, it feels like AIs are planning at the level of a two year old human, versus being closer to a four year old at other reasoning abilities. I expect that relative maturity to continue for a while.
LeCun’s JEPA Model
Yann LeCun has a strategy for developing human-level planning, outlined in A Path Towards Autonomous Machine Intelligence:
LeCun may well have one of the best approaches to human level long-term planning. If so, his belief that human level AI is a long way away constitutes some sort of evidence that planning will be slow to develop.
Conclusion
This kind of analysis has unavoidable uncertainties. There might be some simple tricks that make AI planning work better than human planning. But this analysis seems to be the best I can do.
I’m leaning toward expecting a nontrivial period in which AIs have mostly human-level abilities, but are too short-sighted for rogue AIs to be a major problem.
So I expect the most serious AI risks to be a few years further away than what I’d expect if I were predicting based on IQ-style tests.
I have moderate hopes for a period of more than a year in which AI assistants can contribute a fair amount to speeding up safety research.