And I think there’s approximately a 0% chance humanity will stop at pure language models, or even stop at o1 and o3, which very likely to use RL to dramatically enhance capabilities.
Because they use RL not to accomplish things-in-the-world but to arrive at correct answers to questions they’re posed, the concerns you express (and pretty much anyone who’s been paying attention to AGI risks agrees with) are not fully in play.
Open AI will continue on this path unless legislation stops them. And that’s highly unlikely to happen, because the argument against is just not strong enough to convince the public or legislators.
We are mostly applying optimization pressure to our AGI systems to follow instructions and produce correct answers. Framed that way, it sounds like it’s as safe an approach as you could come up with for network-based AGI. I’m not saying it’s safe, but I am saying it’s hard to be sure it’s not without more detailed arguments and analysis. Which is what I’m trying to do in my work.
Also as you say, it would be far safer to not make these things into agents. But the ease of doing so with a smart enough model and a prompt like “continue pursuing goal X using tools Y as appropriate to gather information and take actions” ensures that they will be turned into agents.
People want a system that actually does their work, not one that just tells them what to do. So they’re going to make agents out of smart LLMs. This won’t be stopped even with legislation; people will do it illegally or from jurisdictions that haven’t passed the laws.
So we are going to have to both hope and plan for this approach, including RL for correct answers, is safe enough. Or come up with way stronger and more convincing arguments for why it won’t. I currently think it can be made safe in a realistic way with no major policy or research direction change. But I just don’t know, because I haven’t gotten enough people to engage deeply enough with the real difficulties and likely approaches.
Thank you Seth for the thoughtful reply. I largely agree with most of your points.
I agree that RL trained to accomplish things in the real world is far more dangerous than RL trained to just solve difficult mathematical problems (which in turn is more dangerous than vanilla language modeling). I worry that the real-world part will soon become commonplace, judging from current trends.
But even without the real-world part, models could still be incentivized to develop superhumam abilities and complex strategic thinking (which could be useful for solving mathematical and coding problens).
Regarding the chances of stopping/banning open-ended RL, I agree it’s a very tall order, but my impression of the advocacy/policy landscape is that people might be open to it under the right conditions. At any rate I wasn’t trying to reason about what’s reasonable to ask for, only on the implications of different paths. I think the discussion should start there, and then we can consider what’s wise to advocate for.
For all of these reasons, I fully agree with you that work on demonstrating these risks in a rigorous and credible way is one of the most important efforts for AI safety.
I think you’re absolutely right.
And I think there’s approximately a 0% chance humanity will stop at pure language models, or even stop at o1 and o3, which very likely to use RL to dramatically enhance capabilities.
Because they use RL not to accomplish things-in-the-world but to arrive at correct answers to questions they’re posed, the concerns you express (and pretty much anyone who’s been paying attention to AGI risks agrees with) are not fully in play.
Open AI will continue on this path unless legislation stops them. And that’s highly unlikely to happen, because the argument against is just not strong enough to convince the public or legislators.
We are mostly applying optimization pressure to our AGI systems to follow instructions and produce correct answers. Framed that way, it sounds like it’s as safe an approach as you could come up with for network-based AGI. I’m not saying it’s safe, but I am saying it’s hard to be sure it’s not without more detailed arguments and analysis. Which is what I’m trying to do in my work.
Also as you say, it would be far safer to not make these things into agents. But the ease of doing so with a smart enough model and a prompt like “continue pursuing goal X using tools Y as appropriate to gather information and take actions” ensures that they will be turned into agents.
People want a system that actually does their work, not one that just tells them what to do. So they’re going to make agents out of smart LLMs. This won’t be stopped even with legislation; people will do it illegally or from jurisdictions that haven’t passed the laws.
So we are going to have to both hope and plan for this approach, including RL for correct answers, is safe enough. Or come up with way stronger and more convincing arguments for why it won’t. I currently think it can be made safe in a realistic way with no major policy or research direction change. But I just don’t know, because I haven’t gotten enough people to engage deeply enough with the real difficulties and likely approaches.
Thank you Seth for the thoughtful reply. I largely agree with most of your points.
I agree that RL trained to accomplish things in the real world is far more dangerous than RL trained to just solve difficult mathematical problems (which in turn is more dangerous than vanilla language modeling). I worry that the real-world part will soon become commonplace, judging from current trends.
But even without the real-world part, models could still be incentivized to develop superhumam abilities and complex strategic thinking (which could be useful for solving mathematical and coding problens).
Regarding the chances of stopping/banning open-ended RL, I agree it’s a very tall order, but my impression of the advocacy/policy landscape is that people might be open to it under the right conditions. At any rate I wasn’t trying to reason about what’s reasonable to ask for, only on the implications of different paths. I think the discussion should start there, and then we can consider what’s wise to advocate for.
For all of these reasons, I fully agree with you that work on demonstrating these risks in a rigorous and credible way is one of the most important efforts for AI safety.