The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.
The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.
It predicts that we’ll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don’t know yet, maybe huge, maybe so tiny it’s essentially not used. But that’s only in the very near term. The same arguments will keep on applying forever, if they’re correct.
WRT your comment that we don’t have a handle on values or drives, I think that’s flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values—implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.
I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don’t agentize it at some point.
Edit: Thinking about this a little more, I do see one plausible future in which we don’t agentize tool AI: one with a “pivotal act” that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.