I think this is a key crux of disagreement on alignment:
When I bring up the “actual RL algorithms don’t seem very dangerous or agenty to me” point, people often respond with “Future algorithms will be different and more dangerous”.
I think this is a bad response for many reasons.
On the one hand, empiricism and assuming that the future will be much like the past have a great track record.
On the other, predicting the future is the name of the game in alignment. And while the future is reliably much like the past, it’s never been exactly like the past.
So opinions pull in both directions.
On the object level, I certainly agree that existing RL systems aren’t very agenty or dangerous. It seems like you’re predicting that people won’t make AI that’s particularly agentic any time soon. It seems to me that they’ll certainly want to. And I think it will be easy if non-agentic foundation models get good. Turning a smart foundation model into an agent is as simple as the prompt “make and execute a plan that accomplishes goal [x]. Use [these APIs] to gather information and take actions”.
I think this is what Alex was pointing to in the OP by saying
I’m worried about people turning AIs into agentic systems using scaffolding and other tricks, and then instructing the systems to complete large-scale projects.
I think this is the default future, so much so that I don’t think it matters if agency would emerge through RL. We’ll build it in. Humans are burdened with excessive curiousity, optimism, and ambition. Especially the type of humans that head AI/AGI projects.
I think this is a key crux of disagreement on alignment:
On the one hand, empiricism and assuming that the future will be much like the past have a great track record.
On the other, predicting the future is the name of the game in alignment. And while the future is reliably much like the past, it’s never been exactly like the past.
So opinions pull in both directions.
On the object level, I certainly agree that existing RL systems aren’t very agenty or dangerous. It seems like you’re predicting that people won’t make AI that’s particularly agentic any time soon. It seems to me that they’ll certainly want to. And I think it will be easy if non-agentic foundation models get good. Turning a smart foundation model into an agent is as simple as the prompt “make and execute a plan that accomplishes goal [x]. Use [these APIs] to gather information and take actions”.
I think this is what Alex was pointing to in the OP by saying
I think this is the default future, so much so that I don’t think it matters if agency would emerge through RL. We’ll build it in. Humans are burdened with excessive curiousity, optimism, and ambition. Especially the type of humans that head AI/AGI projects.