I’ve been following the “tools want to become agents” argument since Holden Karnofsky raised the topic a long time ago, and I was almost convinced by the logic, but the LLMs show a very surprising lack of agency, and, as far as I can tell, this gap between apparent intelligence and apparent agency was never predicted or expected by the alignment theorists. I would trust their cautions more if they had a model that makes good predictions.
but the LLMs show a very surprising lack of agency,
No, they don’t. LLMs show an enormous amount of agency. They will generate text which includes things like plans even without a prompt for that. (As should come as no surprise, because LLMs are RL agents which have been trained offline on data solely from agents, ie. behavioral cloning.) They contain so much agency you can plop them down into robots without any training whatsoever and they provide useful control. They are tools that want to be agents so much that I think it took all of a few weeks for someone to hook up the OA API to a commandline in June 2020 and try to make it autonomous. LLMs like LaMDA started being granted access to the live Internet not long after that, simply because it’s so obviously useful for tool AIs to become agent AIs. Startups like Adept, dedicated solely to turning these tools into agents, followed not long after that. I hardly need mention Sydney. And here we are in the present where OA has set up hundreds of plugins and an vast VM system to let their tools do agent-like things like autonomously write code, browse webpages, and integrate with other systems like make phone calls. LLMs show an enormous amount of agency, and it’s only increasing over time under competitive and economic pressure, because more agency makes them more useful. Exactly as predicted.
Maybe we understand agency differently. You give LLMs tools to use, they will use it. But there is no discernable drive or “want” to change the world to their liking. Not saying that it won’t show up some day, it’s just conspicuously lagging the capabilities.
I guess that is one way to say it. But the statement is stronger than that, I think. They do not care about the box or about anything else. They react to stimuli, then go silent again.
So are you saying that you don’t think we’ll build agentic AI any time soonish? I’d love to hear your reasoning on that, because I’d rest easier if I felt the same way.
I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven’t heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn’t that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn’t seem like a good reason to expect similar language model agents to fail for a long time. I do think it’s possible they won’t work, but people will give it a more serious try than we’ve seen publicly so far. And if this approach doesn’t hit AGI, the next one will experience similar pressures to be made into an agent.
As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It’s always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.
I do not know if we will or will not build something recognizable agentic any time soon. I am simply pointing out that currently there is a sizable gap that people did not predict back then. Given that we still have no good model what constitutes values or drives (definitely not a utility function, since LLMs have plenty of that), I am very much uncertain about the future, and I would hesitate to unequivocally state that “AGI isn’t just a technology”. So far it most definitely is “just a technology”, despite the original expectations to the contrary by the alignment people.
Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren’t certain what would happen and so didn’t bother to make any plans for different things that would happen.
Yeah, I get that. But to look into the future one must take stock of the past and present and reevaluate models that gave wrong predictions. I am yet to see this happening.
The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.
I’ve been following the “tools want to become agents” argument since Holden Karnofsky raised the topic a long time ago, and I was almost convinced by the logic, but the LLMs show a very surprising lack of agency, and, as far as I can tell, this gap between apparent intelligence and apparent agency was never predicted or expected by the alignment theorists. I would trust their cautions more if they had a model that makes good predictions.
No, they don’t. LLMs show an enormous amount of agency. They will generate text which includes things like plans even without a prompt for that. (As should come as no surprise, because LLMs are RL agents which have been trained offline on data solely from agents, ie. behavioral cloning.) They contain so much agency you can plop them down into robots without any training whatsoever and they provide useful control. They are tools that want to be agents so much that I think it took all of a few weeks for someone to hook up the OA API to a commandline in June 2020 and try to make it autonomous. LLMs like LaMDA started being granted access to the live Internet not long after that, simply because it’s so obviously useful for tool AIs to become agent AIs. Startups like Adept, dedicated solely to turning these tools into agents, followed not long after that. I hardly need mention Sydney. And here we are in the present where OA has set up hundreds of plugins and an vast VM system to let their tools do agent-like things like autonomously write code, browse webpages, and integrate with other systems like make phone calls. LLMs show an enormous amount of agency, and it’s only increasing over time under competitive and economic pressure, because more agency makes them more useful. Exactly as predicted.
Maybe we understand agency differently. You give LLMs tools to use, they will use it. But there is no discernable drive or “want” to change the world to their liking. Not saying that it won’t show up some day, it’s just conspicuously lagging the capabilities.
In other words, LLMs are not actively trying to “get out of the box”.
I guess that is one way to say it. But the statement is stronger than that, I think. They do not care about the box or about anything else. They react to stimuli, then go silent again.
So are you saying that you don’t think we’ll build agentic AI any time soonish? I’d love to hear your reasoning on that, because I’d rest easier if I felt the same way.
I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven’t heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn’t that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn’t seem like a good reason to expect similar language model agents to fail for a long time. I do think it’s possible they won’t work, but people will give it a more serious try than we’ve seen publicly so far. And if this approach doesn’t hit AGI, the next one will experience similar pressures to be made into an agent.
As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It’s always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.
I do not know if we will or will not build something recognizable agentic any time soon. I am simply pointing out that currently there is a sizable gap that people did not predict back then. Given that we still have no good model what constitutes values or drives (definitely not a utility function, since LLMs have plenty of that), I am very much uncertain about the future, and I would hesitate to unequivocally state that “AGI isn’t just a technology”. So far it most definitely is “just a technology”, despite the original expectations to the contrary by the alignment people.
Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren’t certain what would happen and so didn’t bother to make any plans for different things that would happen.
Yeah, I get that. But to look into the future one must take stock of the past and present and reevaluate models that gave wrong predictions. I am yet to see this happening.
The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.