Hooking up ai subsystems is predictably harder than you’re implying. Humans are terrible at building agi, the only thing we get to work is optimization under minimal structural assumptions. The connections between subsystems will have to be learned not hardcoded, and that will be a bottleneck—very possibly somehow unified system trained in a somewhat clever way will get there first.
You really think humans are terrible at building AGI after the sudden success of LLMs? I think success builds on success, and (neural net-based) intelligence is turning out to be actually a lot easier than we thought.
I have been involved in two major project of hooking up different components of cognitive architectures. It was a nightmare, as you say. Yet there are already rapid advances in hooking up LLMs to different systems in different roles, for the reasons Nathan gives: their general intelligence makes them better at controlling other subsystems and taking in information from them.
Perhaps I should qualify what I mean by “easy”. Five years is well within my timeline. That’s not a lot of time to work on alignment. And less than five years for scary capabilities is also quite possible. It could be longer, which would be great- but shouldn’t at least a significant subset of us be working on the shortest realistic timeline scenarios? Giving up on them makes no sense.
Eventually—but agency is not sequence prediction + a few hacks. The remaining problems are hard. Massive compute, investment, and enthusiasm will lead to faster progress—i objected to 5 year timelines after chatgpt, but now it’s been a couple years. I think 5 years is still too soon but I’m not sure.
Edit: After Nathan offered to bet my claim is false, I bet no on his market at 82% claiming (roughly) that inference compute is as valuable as training computer for GPT-5: https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference. I expect this will be difficult to resolve because o1 is the closest we will get to a GPT-5 and it presumably benefits from both more training (including RLHF) and more inference compute. I think its perfectly possible that well thought out reinforcement learning can be as valuable as pretraining, but for practical purposes I expect scaling inference compute on a base model will not see qualitative improvements. I will reach out about more closely related bets.
For dumb subsystems, yes. But the picture changes when one of the subsystems is general intelligence. Putting an LLM in charge of controlling a robot seems like it should be hard, since robotics is always hard… and yet, there’s been a rash of successes with this recently as LLMs have gotten just-barely-general-enough to do a decent job of this.
So my prediction is that as we make smarter and more generally capable models, a lot of the other specific barriers (such as embodiment, or emulated keyboard/mouse use) fall away faster than you’d predict from past trends.
So then the question is, how much difficulty will there be in hooking up the subsystems of the general intelligence module: memory, recursive reasoning, multi-modal sensory input handling, etc. A couple years ago I was arguing with people that the jump from language-only to multi-modal would be quick, and also that soon after one group did it that many others would follow suit and it would become a new standard. This was met with skepticism at the time, people argued it would take longer and be more difficult than I was predicting and that we should expect the change to happen further out into the future (e.g. > 5 years) and occur gradually. Now vision+language is common in the frontier models.
So yeah, it’s hard to do such things, but like.… it’s a challenge which I expect teams of brilliant engineers with big research budgets to be able to conquer. Not hard like I expect them to try their best, but fail and be completely blocked for many years, leading to a general halt of progress across all existing teams.
For what it’s worth, though I can’t point to specific predictions I was not at all surprised by multi-modality. It’s still a token prediction problem, there are not fundamental theoretical differences. I think that modestly more insights are necessary for these other problems.
Hooking up ai subsystems is predictably harder than you’re implying. Humans are terrible at building agi, the only thing we get to work is optimization under minimal structural assumptions. The connections between subsystems will have to be learned not hardcoded, and that will be a bottleneck—very possibly somehow unified system trained in a somewhat clever way will get there first.
You really think humans are terrible at building AGI after the sudden success of LLMs? I think success builds on success, and (neural net-based) intelligence is turning out to be actually a lot easier than we thought.
I have been involved in two major project of hooking up different components of cognitive architectures. It was a nightmare, as you say. Yet there are already rapid advances in hooking up LLMs to different systems in different roles, for the reasons Nathan gives: their general intelligence makes them better at controlling other subsystems and taking in information from them.
Perhaps I should qualify what I mean by “easy”. Five years is well within my timeline. That’s not a lot of time to work on alignment. And less than five years for scary capabilities is also quite possible. It could be longer, which would be great- but shouldn’t at least a significant subset of us be working on the shortest realistic timeline scenarios? Giving up on them makes no sense.
I’m not convinced that LLM agents are useful for anything.
Me either!
I’m convinced that they will be useful for a lot of things. Progress happens.
Eventually—but agency is not sequence prediction + a few hacks. The remaining problems are hard. Massive compute, investment, and enthusiasm will lead to faster progress—i objected to 5 year timelines after chatgpt, but now it’s been a couple years. I think 5 years is still too soon but I’m not sure.
Edit: After Nathan offered to bet my claim is false, I bet no on his market at 82% claiming (roughly) that inference compute is as valuable as training computer for GPT-5: https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference. I expect this will be difficult to resolve because o1 is the closest we will get to a GPT-5 and it presumably benefits from both more training (including RLHF) and more inference compute. I think its perfectly possible that well thought out reinforcement learning can be as valuable as pretraining, but for practical purposes I expect scaling inference compute on a base model will not see qualitative improvements. I will reach out about more closely related bets.
For dumb subsystems, yes. But the picture changes when one of the subsystems is general intelligence. Putting an LLM in charge of controlling a robot seems like it should be hard, since robotics is always hard… and yet, there’s been a rash of successes with this recently as LLMs have gotten just-barely-general-enough to do a decent job of this.
So my prediction is that as we make smarter and more generally capable models, a lot of the other specific barriers (such as embodiment, or emulated keyboard/mouse use) fall away faster than you’d predict from past trends.
So then the question is, how much difficulty will there be in hooking up the subsystems of the general intelligence module: memory, recursive reasoning, multi-modal sensory input handling, etc. A couple years ago I was arguing with people that the jump from language-only to multi-modal would be quick, and also that soon after one group did it that many others would follow suit and it would become a new standard. This was met with skepticism at the time, people argued it would take longer and be more difficult than I was predicting and that we should expect the change to happen further out into the future (e.g. > 5 years) and occur gradually. Now vision+language is common in the frontier models.
So yeah, it’s hard to do such things, but like.… it’s a challenge which I expect teams of brilliant engineers with big research budgets to be able to conquer. Not hard like I expect them to try their best, but fail and be completely blocked for many years, leading to a general halt of progress across all existing teams.
For what it’s worth, though I can’t point to specific predictions I was not at all surprised by multi-modality. It’s still a token prediction problem, there are not fundamental theoretical differences. I think that modestly more insights are necessary for these other problems.