So are you saying that you don’t think we’ll build agentic AI any time soonish? I’d love to hear your reasoning on that, because I’d rest easier if I felt the same way.
I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven’t heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn’t that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn’t seem like a good reason to expect similar language model agents to fail for a long time. I do think it’s possible they won’t work, but people will give it a more serious try than we’ve seen publicly so far. And if this approach doesn’t hit AGI, the next one will experience similar pressures to be made into an agent.
As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It’s always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.
I do not know if we will or will not build something recognizable agentic any time soon. I am simply pointing out that currently there is a sizable gap that people did not predict back then. Given that we still have no good model what constitutes values or drives (definitely not a utility function, since LLMs have plenty of that), I am very much uncertain about the future, and I would hesitate to unequivocally state that “AGI isn’t just a technology”. So far it most definitely is “just a technology”, despite the original expectations to the contrary by the alignment people.
Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren’t certain what would happen and so didn’t bother to make any plans for different things that would happen.
Yeah, I get that. But to look into the future one must take stock of the past and present and reevaluate models that gave wrong predictions. I am yet to see this happening.
The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.
So are you saying that you don’t think we’ll build agentic AI any time soonish? I’d love to hear your reasoning on that, because I’d rest easier if I felt the same way.
I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven’t heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn’t that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn’t seem like a good reason to expect similar language model agents to fail for a long time. I do think it’s possible they won’t work, but people will give it a more serious try than we’ve seen publicly so far. And if this approach doesn’t hit AGI, the next one will experience similar pressures to be made into an agent.
As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It’s always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.
I do not know if we will or will not build something recognizable agentic any time soon. I am simply pointing out that currently there is a sizable gap that people did not predict back then. Given that we still have no good model what constitutes values or drives (definitely not a utility function, since LLMs have plenty of that), I am very much uncertain about the future, and I would hesitate to unequivocally state that “AGI isn’t just a technology”. So far it most definitely is “just a technology”, despite the original expectations to the contrary by the alignment people.
Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren’t certain what would happen and so didn’t bother to make any plans for different things that would happen.
Yeah, I get that. But to look into the future one must take stock of the past and present and reevaluate models that gave wrong predictions. I am yet to see this happening.
The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.