I think he’s totally right that there’s a missing ability in LLMs. He doesn’t claim this will be a big blocker. I think we’d be fools to assume this gives us much more time.
My previous comment on this podcast and pretty much this question says more.
Briefly: There might be a variety of fairly easy ways to add more System 2 thinking and problem-solving and reasoning in genuinely new domains. Here’s one for system 2 thinking, and here’s one for better reasoning and knowledge discover. There might easily be six more that people are busy implementing, half of which will work pretty quickly.
This could be a bottleneck that gives us extra years, but assuming that seems like a very bad idea. We should step lively on the whole alignment project in case this intelligence stuff happens to be a lot easier than we’ve been thinking prior to getting enough compute and deep nets that really work.
WRT the consensus you mention: there’s no consensus, here or elsewhere. Nobody knows. Taking an average would be a bad idea. The distribution among people who’ve got the right expertise (or as close as we get now) and spend time on prediction is still very broad. This says pretty clearly that nobody knows. That includes this question as well as all other timeline questions. We can’t be sure until it’s built and working. I’ve got lots of reasoning behind my guess that this won’t take long to solve, but I wouldn’t place heavy odds on being right.
That broad distribution is why the smart bet is to have an alignment solution ready for the shorter projected timelines.
There might be a variety of fairly easy ways to add more System 2 thinking and problem-solving and reasoning in genuinely new domains. Here’s one for system 2 thinking
This approach seemed more plausible to me a year ago than it does now. It seemed feasible enough that I sketched out takeover scenarios along those lines (eg here). But I think we should update on the fact that there doesn’t seem to have been much progress in this direction since then, despite eg Auto-GPT getting $12M in funding in November, and lots of other startups racing for commercially useful scaffolded agentic systems. Maybe there are significant successes in that area that I’m unaware of?
Of course a year isn’t that long. But I still think it warrants an update if I’m not missing something. And ‘LLMs are currently incapable of dealing with novel situations’ is the best explanation I see for why that hasn’t happened.
There are other interesting places where LLMs fail badly at reasoning, eg planning problems like block-world or scheduling meetings between people with availability constraints; see eg this paper & other work from Kambhampati.
I’ve been considering putting some time into this as a research direction; the ML community has a literature on the topic but it doesn’t seem to have been discussed much in AIS, although the ARC prize could change that. I have an initial sketch of such a direction here, combining lit review & experimentation. Feedback welcomed!
Note: one possible reason that scaffolded agents haven’t succeeded better yet is the argument that Sholto Douglas & Trenton Bricken make on a recent Dwarkesh Patel podcast: that you just need another couple of orders of magnitude of reliability before you can string together substantial chains of subgoals and get good results.
This is a good point. I keep expecting to see useful agents released, or to hear the open-source community get excited about their success with non-commercial projects. Neither have happened. So we should update at least a bit. This is harder or less useful than it first seemed.
But I think there’s still a good chance that this is the fastest and most obvious route to AGI. In the article I linked, about a year old now, I didn’t predict that GPT4 could be turned into AGI, just that LLMs could—and I noted that it would more likely be GPT5 or GPT6 combined with scaffolding that becomes very useful and very dangerous, relatively easily. The dumber the model, the better and more elaborate the scaffolding needs to be to get it past the point of autonomous reasoning and usefuluness.
There are two factors you don’t mention. One is that the biggest blocker to commercial usefulness wasn’t reasoning ability, it was ability to correctly interpret a webpage or other software. Multimodal models largely solved that. So most of the commercial dev effort probably went there until the availability of natively multimodal LLM APIs. That was about six months ago, still a long time. And that doesn’t account for less-commercial efforts.
The second is the possibility that GPT4 and the current gen just aren’t quite smart enough to have scaffolded System 2 work. The article Large Language Models Cannot Self-Correct Reasoning Yet from DeepMind and academic authors in oct. 23 draws this conclusion. (The many reports of useful self-correction were based on terrible methodology that miscalculated base rates when you allow multiple guesses. Computer scientists are even worse at methodology than social scientists, apparently. )
The “Yet” in their title is important. They think a little more native reasoning ability would get them over the hump to doing useful self-correction. That’s one huge application of System 2 reasoning, but not all of it.
My current fear is not that this system 2 scaffolding won’t work, it’s that it won’t work fast enough and easily enough to be the dominant approach. If we bake in the planning and reasoning abilities using RL, a lot of the advantages of language model agents disappear, and several big reasons to think we’ll get alignment wrong come back into play.
So I’m thinking that alignment people should actually help make scaffolded system 2 reasoning work, which is a pretty radical proposal relative to most alignment thought.
But I think there’s still a good chance that this is the fastest and most obvious route to AGI.
Agreed that it’s quite plausible that LLMs with scaffolding basically scale to AGI. Mostly I’m just arguing that it’s an open question with important implications for safety & in particular timelines.
One is that the biggest blocker to commercial usefulness wasn’t reasoning ability, it was ability to correctly interpret a webpage or other software.
I’m very skeptical of this with respect to web pages. Some pages include images (eg charts) that are necessary to understand the page content, but for many or most pages, the important content is text in an HTML file, and we know LLMs handle HTML just fine (since they can easily create it on demand).
The second is the possibility that GPT4 and the current gen just aren’t quite smart enough to have scaffolded System 2 work.
Agreed, this seems like a totally live possibility.
So I’m thinking that alignment people should actually help make scaffolded system 2 reasoning work, which is a pretty radical proposal relative to most alignment thought.
Personally I’d have to be a lot more confident that alignment of such systems just works to favor alignment researchers advancing capabilities; to me having additional time before AGI seems much more clearly valuable.
I was also surprised that interpreting webpages was a major blocker. They’re in text and HTML, as you say.
I don’t remember who said this, but I remember believing them since they’d actually tried to make useful agents. They said that actual modern webpages are such a flaming mess of complex HTML that the LLMs get confused easily.
Your last point, whether the direction to easier-to-align AGI or more time to work on alignment is preferable is a very complex issue. I don’t have a strong opinion since I haven’t worked through it all. But I think there are very strong reasons to think LLM-based AGI is far easier to align than other forms, particularly if the successful approach doesn’t heavily rely on RL. So I think your opinion is in the majority, but nobody has worked it through carefully enough to have a really good guess. That’s a project I’d like to embark on by writing a post making the controversial suggestion that maybe we should be actively building LMA AGI as the safest of a bad set of options.
I also think we’ll get substantial info about the feasibility of LMA in the next six months. Progress on ARC-AGI will tell us a lot about LLMs as general reasoners, I think (and Redwood’s excellent new work on ARC-AGI has already updated me somewhat toward this not being a fundamental blocker). And I think GPT-5 will tell us a lot. ‘GPT-4 comes just short of being capable and reliable enough to work well for agentic scaffolding’ is a pretty plausible view. If that’s true, then we should see such scaffolding working a lot better with GPT-5; if it’s false, then we should see continued failures to make it really work.
I realized I didn’t really reply to your first point, and that it’s a really important one.
We’re in agreement that scaffolded LLMs are a possible first route to AGI, but not a guaranteed one.
If that’s the path, timelines are relatively short.
If that’s a possibility, we’d better have alignment solutions for that possible path, ASAP.
That’s why I continue to focus on aligning LMAs.
If other paths to AGI turn out to be the first routes, timelines are probably a little longer, so we’ve got a little longer to work on alignment for those types of systems. And there are more people working on RL-based alignment schemes (I think?)
I think he’s totally right that there’s a missing ability in LLMs. He doesn’t claim this will be a big blocker. I think we’d be fools to assume this gives us much more time.
My previous comment on this podcast and pretty much this question says more.
Briefly: There might be a variety of fairly easy ways to add more System 2 thinking and problem-solving and reasoning in genuinely new domains. Here’s one for system 2 thinking, and here’s one for better reasoning and knowledge discover. There might easily be six more that people are busy implementing, half of which will work pretty quickly.
This could be a bottleneck that gives us extra years, but assuming that seems like a very bad idea. We should step lively on the whole alignment project in case this intelligence stuff happens to be a lot easier than we’ve been thinking prior to getting enough compute and deep nets that really work.
WRT the consensus you mention: there’s no consensus, here or elsewhere. Nobody knows. Taking an average would be a bad idea. The distribution among people who’ve got the right expertise (or as close as we get now) and spend time on prediction is still very broad. This says pretty clearly that nobody knows. That includes this question as well as all other timeline questions. We can’t be sure until it’s built and working. I’ve got lots of reasoning behind my guess that this won’t take long to solve, but I wouldn’t place heavy odds on being right.
That broad distribution is why the smart bet is to have an alignment solution ready for the shorter projected timelines.
He does, otherwise the claim that OpenAI pushed back AGI timelines by 5-10 years doesn’t make sense.
I stand corrected. I didn’t remember that his estimate of increase was that large. That seems like a very high estimate to me, for the reasons above.
However, part of that was about going from open research to closed.
This approach seemed more plausible to me a year ago than it does now. It seemed feasible enough that I sketched out takeover scenarios along those lines (eg here). But I think we should update on the fact that there doesn’t seem to have been much progress in this direction since then, despite eg Auto-GPT getting $12M in funding in November, and lots of other startups racing for commercially useful scaffolded agentic systems. Maybe there are significant successes in that area that I’m unaware of?
Of course a year isn’t that long. But I still think it warrants an update if I’m not missing something. And ‘LLMs are currently incapable of dealing with novel situations’ is the best explanation I see for why that hasn’t happened.
There are other interesting places where LLMs fail badly at reasoning, eg planning problems like block-world or scheduling meetings between people with availability constraints; see eg this paper & other work from Kambhampati.
I’ve been considering putting some time into this as a research direction; the ML community has a literature on the topic but it doesn’t seem to have been discussed much in AIS, although the ARC prize could change that. I have an initial sketch of such a direction here, combining lit review & experimentation. Feedback welcomed!
Note: one possible reason that scaffolded agents haven’t succeeded better yet is the argument that Sholto Douglas & Trenton Bricken make on a recent Dwarkesh Patel podcast: that you just need another couple of orders of magnitude of reliability before you can string together substantial chains of subgoals and get good results.
This is a good point. I keep expecting to see useful agents released, or to hear the open-source community get excited about their success with non-commercial projects. Neither have happened. So we should update at least a bit. This is harder or less useful than it first seemed.
But I think there’s still a good chance that this is the fastest and most obvious route to AGI. In the article I linked, about a year old now, I didn’t predict that GPT4 could be turned into AGI, just that LLMs could—and I noted that it would more likely be GPT5 or GPT6 combined with scaffolding that becomes very useful and very dangerous, relatively easily. The dumber the model, the better and more elaborate the scaffolding needs to be to get it past the point of autonomous reasoning and usefuluness.
There are two factors you don’t mention. One is that the biggest blocker to commercial usefulness wasn’t reasoning ability, it was ability to correctly interpret a webpage or other software. Multimodal models largely solved that. So most of the commercial dev effort probably went there until the availability of natively multimodal LLM APIs. That was about six months ago, still a long time. And that doesn’t account for less-commercial efforts.
The second is the possibility that GPT4 and the current gen just aren’t quite smart enough to have scaffolded System 2 work. The article Large Language Models Cannot Self-Correct Reasoning Yet from DeepMind and academic authors in oct. 23 draws this conclusion. (The many reports of useful self-correction were based on terrible methodology that miscalculated base rates when you allow multiple guesses. Computer scientists are even worse at methodology than social scientists, apparently. )
The “Yet” in their title is important. They think a little more native reasoning ability would get them over the hump to doing useful self-correction. That’s one huge application of System 2 reasoning, but not all of it.
My current fear is not that this system 2 scaffolding won’t work, it’s that it won’t work fast enough and easily enough to be the dominant approach. If we bake in the planning and reasoning abilities using RL, a lot of the advantages of language model agents disappear, and several big reasons to think we’ll get alignment wrong come back into play.
So I’m thinking that alignment people should actually help make scaffolded system 2 reasoning work, which is a pretty radical proposal relative to most alignment thought.
Agreed that it’s quite plausible that LLMs with scaffolding basically scale to AGI. Mostly I’m just arguing that it’s an open question with important implications for safety & in particular timelines.
I’m very skeptical of this with respect to web pages. Some pages include images (eg charts) that are necessary to understand the page content, but for many or most pages, the important content is text in an HTML file, and we know LLMs handle HTML just fine (since they can easily create it on demand).
Agreed, this seems like a totally live possibility.
Personally I’d have to be a lot more confident that alignment of such systems just works to favor alignment researchers advancing capabilities; to me having additional time before AGI seems much more clearly valuable.
I was also surprised that interpreting webpages was a major blocker. They’re in text and HTML, as you say.
I don’t remember who said this, but I remember believing them since they’d actually tried to make useful agents. They said that actual modern webpages are such a flaming mess of complex HTML that the LLMs get confused easily.
Your last point, whether the direction to easier-to-align AGI or more time to work on alignment is preferable is a very complex issue. I don’t have a strong opinion since I haven’t worked through it all. But I think there are very strong reasons to think LLM-based AGI is far easier to align than other forms, particularly if the successful approach doesn’t heavily rely on RL. So I think your opinion is in the majority, but nobody has worked it through carefully enough to have a really good guess. That’s a project I’d like to embark on by writing a post making the controversial suggestion that maybe we should be actively building LMA AGI as the safest of a bad set of options.
I think that’d be a really valuable post!
I also think we’ll get substantial info about the feasibility of LMA in the next six months. Progress on ARC-AGI will tell us a lot about LLMs as general reasoners, I think (and Redwood’s excellent new work on ARC-AGI has already updated me somewhat toward this not being a fundamental blocker). And I think GPT-5 will tell us a lot. ‘GPT-4 comes just short of being capable and reliable enough to work well for agentic scaffolding’ is a pretty plausible view. If that’s true, then we should see such scaffolding working a lot better with GPT-5; if it’s false, then we should see continued failures to make it really work.
I realized I didn’t really reply to your first point, and that it’s a really important one.
We’re in agreement that scaffolded LLMs are a possible first route to AGI, but not a guaranteed one.
If that’s the path, timelines are relatively short.
If that’s a possibility, we’d better have alignment solutions for that possible path, ASAP.
That’s why I continue to focus on aligning LMAs.
If other paths to AGI turn out to be the first routes, timelines are probably a little longer, so we’ve got a little longer to work on alignment for those types of systems. And there are more people working on RL-based alignment schemes (I think?)
I’m glad you’re working on it! I think your arguments are plausible and your approach is potentially promising.