I think your intuitions about costly international coordination are challenged by a few facts about the world. 1) Advanced RL, like open borders + housing deregulation, guarantees vast economic growth in wealthy countries. Open borders, in a way that seems kinda speculative, but intuitively forceful for most people, has the potential to existentially threaten the integrity of a culture, including especially its norms; AI has the potential, in a way that seems kinda speculative, but intuitively forceful for most people, has the potential to existentially threaten all life. The decisions of wealthy countries are apparently extremely strongly correlated, maybe in part for “we’re all human”-type reasons, and maybe in part because legislators and regulators know that they won’t get their ear chewed off for doing things like the US does. With immigration law, there is no attempt at coordination; quite the opposite (e.g. Syrian refugees in the EU). 2) The number of nuclear states is stunningly small if one follows the intuition that wildly uncompetitive behavior, which leaves significant value on the table, produces an unstable situation. Not every country needs to sign on eagerly to avoiding some of the scariest forms of AI. The US/EU/China can shape other countries’ incentives quite powerfully. 3) People in government do not seem to be very zealous about economic growth. Sorry this isn’t a very specific example. But their behavior on issue after issue does not seem very consistent with someone who would see, I don’t know, 25% GDP growth from their country’s imitation learners, and say, “these international AI agreements are too cautious and are holding us back from even more growth”; it seems much more likely to me that politicians’ appetite for risking great power conflict requires much worse economic conditions than that.
In cases 1 and 2, the threat is existential, and countries take big measures accordingly. So I think existing mechanisms for diplomacy and enforcement are powerful enough “coordination mechanisms” to stop highly-capitalized RL projects. I also object a bit to calling a solution here “strong global coordination”. If China makes a law preventing AI that would kill everyone with 1% probability if made, that’s rational for them to do regardless of whether the US does the same. We just need leaders to understand the risks, and we need them to be presiding over enough growth that they don’t need to take desperate action, and that seems doable.
Also, consider how much more state capacity AI-enabled states could have. It seems to me that a vast population of imitation learners (or imitations of populations of imitation learners) can prevent advanced RL from ever being developed, if the latter is illegal; they don’t have to compete with them after they’ve been made. If there are well-designed laws against RL (beyond some level of capability), we would have plenty of time to put such enforcement in place.
By process-based RL, I mean: the reward for an action doesn’t depend on the consequences of executing that action. Instead it depends on some overseer’s evaluation of the action, potentially after reading justification or a debate about it or talking with other AI assistants or whatever. I think this has roughly the same risk profile as imitation learning, while potentially being more competitive.
I’m generally excited and optimistic about coordination. If you are just saying that AI non-proliferation isn’t that much harder than nuclear non-proliferation, then I think I’m with you. But I think (i) it’s totally fair to call that “strong global coordination,” (ii) you would probably have to do a somewhat better job than we did of nuclear non-proliferation.
I think the technical question is usually going to be about how to trade off capability against risk. If you didn’t care about that at all, you could just not build scary ML systems. I’m saying that you should build smaller models with process-based RL.
It might be good to focus on legible or easy-to-enforce lines rather than just trading off capability vs risk optimally. But I don’t think that “no RL” is effective as a line—it still leaves you with a lot of reward-hacking (e.g. by planning against an ML model, or predicting what actions lead to a high reward, or expert iteration...). Trying to avoid all these things requires really tightly monitoring every use of AI, rather than just training runs. And I’m not convinced it helps significantly with deceptive alignment.
So in any event it seems like you are going to care about model size. “No big models” is also a way easier line to enforce. This is pretty much like saying “minimize the amount of black-box end-to-end optimization you do,” which feels like it gets closer to the heart of the issue.
If you are taking that approach, I think you would probably prefer to do process-based RL with smaller models, rather than imitation learning with bigger models (and will ultimately want to use outcomes in relatively safe ways). Yes it would be safer to use neither process-based RL nor big models, and just make your AI weaker. But the main purpose of technical work is to reduce how demanding the policy ask is—how much people are being asked to give up, how unstable the equilibrium is, how much powerful AI we can tolerate in order to help enforce or demonstrate necessity. Otherwise we wouldn’t be talking about these compromises at all—we’d just be pausing AI development now until safety is better understood.
I would quickly change my tune on this if e.g. we got some indication that process-based RL increased rather than decreased the risk of deceptive alignment at a fixed level of capability.
I think [process-based RL] has roughly the same risk profile as imitation learning, while potentially being more competitive.
I agree with this in a sense, although I may be quite a bit a more harsh about what counts as “executing an action”. For example, if reward is based on an overseer talking about the action with a large group of people/AI assistants, then that counts as “executing the action” in the overseer-conversation environment, even if the action looks like it’s for some other environment, like a plan to launch a new product in the market. I do think myopia in this environment would suffice for existential safety, but I don’t know how much myopia we need.
If you’re always talking about myopic/process-based RLAIF when you say RLAIF, then I think what you’re saying is defensible. I speculate that not everyone reading this recognizes that your usage of RLAIF implies RLAIF with a level of myopia that matches current instances of RLAIF, and that that is a load-bearing part of your position.
I say “defensible” instead of fully agreeing because I weakly disagree that increasing compute is any more of a dangerous way to improve performance than by modifying the objective to a new myopic objective. That is, I disagree with this:
I think you would probably prefer to do process-based RL with smaller models, rather than imitation learning with bigger models
You suggest that increasing compute is the last thing we should do if we’re looking for performance improvements, as opposed to adding a very myopic approval-seeking objective. I don’t see it. I think changing the objective from imitation learning is more likely to lead to problems than scaling up the imitation learners. But this is probably beside the point, because I don’t think problems are particularly likely in either case.
Advanced RL, like open borders + housing deregulation, guarantees vast economic growth in wealthy countries.
I think this comparison is imperfect. Standard economic models predict an acceleration in the growth rate by at least an order of magnitude, and usually more. Over one decade, an increase in economic capacity by 1-4 orders of magnitude seems probable. By contrast, my understanding was that the models of open borders roughly predict a one-time doubling of world GDP over several decades, and for housing, it’s something like a 50% increase in GDP over decades.
Perhaps a better way to put this is that if AI is developed anywhere, even in a small country, that country could soon (within 10 years) grow to be the world’s foremost economic power. Nothing comparable seems true for other policies. There only really needs to be be one successful defecting nation for this coordination to fall apart.
What is process-based RL?
I think your intuitions about costly international coordination are challenged by a few facts about the world. 1) Advanced RL, like open borders + housing deregulation, guarantees vast economic growth in wealthy countries. Open borders, in a way that seems kinda speculative, but intuitively forceful for most people, has the potential to existentially threaten the integrity of a culture, including especially its norms; AI has the potential, in a way that seems kinda speculative, but intuitively forceful for most people, has the potential to existentially threaten all life. The decisions of wealthy countries are apparently extremely strongly correlated, maybe in part for “we’re all human”-type reasons, and maybe in part because legislators and regulators know that they won’t get their ear chewed off for doing things like the US does. With immigration law, there is no attempt at coordination; quite the opposite (e.g. Syrian refugees in the EU). 2) The number of nuclear states is stunningly small if one follows the intuition that wildly uncompetitive behavior, which leaves significant value on the table, produces an unstable situation. Not every country needs to sign on eagerly to avoiding some of the scariest forms of AI. The US/EU/China can shape other countries’ incentives quite powerfully. 3) People in government do not seem to be very zealous about economic growth. Sorry this isn’t a very specific example. But their behavior on issue after issue does not seem very consistent with someone who would see, I don’t know, 25% GDP growth from their country’s imitation learners, and say, “these international AI agreements are too cautious and are holding us back from even more growth”; it seems much more likely to me that politicians’ appetite for risking great power conflict requires much worse economic conditions than that.
In cases 1 and 2, the threat is existential, and countries take big measures accordingly. So I think existing mechanisms for diplomacy and enforcement are powerful enough “coordination mechanisms” to stop highly-capitalized RL projects. I also object a bit to calling a solution here “strong global coordination”. If China makes a law preventing AI that would kill everyone with 1% probability if made, that’s rational for them to do regardless of whether the US does the same. We just need leaders to understand the risks, and we need them to be presiding over enough growth that they don’t need to take desperate action, and that seems doable.
Also, consider how much more state capacity AI-enabled states could have. It seems to me that a vast population of imitation learners (or imitations of populations of imitation learners) can prevent advanced RL from ever being developed, if the latter is illegal; they don’t have to compete with them after they’ve been made. If there are well-designed laws against RL (beyond some level of capability), we would have plenty of time to put such enforcement in place.
By process-based RL, I mean: the reward for an action doesn’t depend on the consequences of executing that action. Instead it depends on some overseer’s evaluation of the action, potentially after reading justification or a debate about it or talking with other AI assistants or whatever. I think this has roughly the same risk profile as imitation learning, while potentially being more competitive.
I’m generally excited and optimistic about coordination. If you are just saying that AI non-proliferation isn’t that much harder than nuclear non-proliferation, then I think I’m with you. But I think (i) it’s totally fair to call that “strong global coordination,” (ii) you would probably have to do a somewhat better job than we did of nuclear non-proliferation.
I think the technical question is usually going to be about how to trade off capability against risk. If you didn’t care about that at all, you could just not build scary ML systems. I’m saying that you should build smaller models with process-based RL.
It might be good to focus on legible or easy-to-enforce lines rather than just trading off capability vs risk optimally. But I don’t think that “no RL” is effective as a line—it still leaves you with a lot of reward-hacking (e.g. by planning against an ML model, or predicting what actions lead to a high reward, or expert iteration...). Trying to avoid all these things requires really tightly monitoring every use of AI, rather than just training runs. And I’m not convinced it helps significantly with deceptive alignment.
So in any event it seems like you are going to care about model size. “No big models” is also a way easier line to enforce. This is pretty much like saying “minimize the amount of black-box end-to-end optimization you do,” which feels like it gets closer to the heart of the issue.
If you are taking that approach, I think you would probably prefer to do process-based RL with smaller models, rather than imitation learning with bigger models (and will ultimately want to use outcomes in relatively safe ways). Yes it would be safer to use neither process-based RL nor big models, and just make your AI weaker. But the main purpose of technical work is to reduce how demanding the policy ask is—how much people are being asked to give up, how unstable the equilibrium is, how much powerful AI we can tolerate in order to help enforce or demonstrate necessity. Otherwise we wouldn’t be talking about these compromises at all—we’d just be pausing AI development now until safety is better understood.
I would quickly change my tune on this if e.g. we got some indication that process-based RL increased rather than decreased the risk of deceptive alignment at a fixed level of capability.
I agree with this in a sense, although I may be quite a bit a more harsh about what counts as “executing an action”. For example, if reward is based on an overseer talking about the action with a large group of people/AI assistants, then that counts as “executing the action” in the overseer-conversation environment, even if the action looks like it’s for some other environment, like a plan to launch a new product in the market. I do think myopia in this environment would suffice for existential safety, but I don’t know how much myopia we need.
If you’re always talking about myopic/process-based RLAIF when you say RLAIF, then I think what you’re saying is defensible. I speculate that not everyone reading this recognizes that your usage of RLAIF implies RLAIF with a level of myopia that matches current instances of RLAIF, and that that is a load-bearing part of your position.
I say “defensible” instead of fully agreeing because I weakly disagree that increasing compute is any more of a dangerous way to improve performance than by modifying the objective to a new myopic objective. That is, I disagree with this:
You suggest that increasing compute is the last thing we should do if we’re looking for performance improvements, as opposed to adding a very myopic approval-seeking objective. I don’t see it. I think changing the objective from imitation learning is more likely to lead to problems than scaling up the imitation learners. But this is probably beside the point, because I don’t think problems are particularly likely in either case.
I think this comparison is imperfect. Standard economic models predict an acceleration in the growth rate by at least an order of magnitude, and usually more. Over one decade, an increase in economic capacity by 1-4 orders of magnitude seems probable. By contrast, my understanding was that the models of open borders roughly predict a one-time doubling of world GDP over several decades, and for housing, it’s something like a 50% increase in GDP over decades.
Perhaps a better way to put this is that if AI is developed anywhere, even in a small country, that country could soon (within 10 years) grow to be the world’s foremost economic power. Nothing comparable seems true for other policies. There only really needs to be be one successful defecting nation for this coordination to fall apart.