(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the “AI pause debate”, framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
AI safety becomes the single community that’s the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that’s the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There’s a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there’s a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they’re “worse than Hitler” (which happened to a friend of mine, actually). People get deontological about AI progress: some hesitate to pay for ChatGPT because it feels like they’re contributing to the problem (another true story); the dynamics around this look similar to environmentalists refusing to fly places. Others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you’re not depressed then you obviously don’t take it seriously enough. Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform), safety advocates block some of the most valuable work on alignment (e.g. scalable oversight, interpretability, adversarial training) due to acceleration or misuse concerns. Of course, nobody will say they want to dramatically slow down alignment research, but there will be such high barriers to researchers getting and studying the relevant models that it has similar effects. The regulations that end up being implemented are messy and full of holes, because the movement is more focused on making a big statement than figuring out the details.
Obviously I’ve exaggerated and caricatured these scenarios, but I think there’s an important point here. One really good thing about the AI safety movement, until recently, is that the focus on the problem of technical alignment has nudged it away from the second scenario (although it wasn’t particularly close to the first scenario either, because the “nerding out” was typically more about decision theory or agent foundations than ML itself). That’s changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where even leading experts dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems. Either way, I do think public advocacy for strong governance measures can be valuable, but I also think that “pause AI” advocacy runs the risk of pushing us towards scenario 2. Even if you think that’s a cost worth paying, I’d urge you to think about ways to get the benefits of the advocacy while reducing that cost and keeping the door open for scenario 1.
FYI I think this is worth fleshing out into a top level post (esp. given that it’s ‘Pause Debate’ week).
I’m not actually sure it needs much fleshing out. I think the main bit here that feels unjustified, or insufficiently-justified for the strength of the claim, is:
That’s changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems.
I think I basically agree with you Richard about the risks of falling into scenario 2, and think this is a wise comment, but I also think you are strawmanning the reason for the change—it’s not that people have come to think that making technical progress is hopeless (or even harder than it used to be!) it’s rather that people have come to have shorter timelines, and so the probability that sufficient technical progress will be made in time has gone down, and the usefulness of calling for a pause has gone up. (e.g. if you think AGI is 15 years away, then pausing now is plausibly useless or even harmful.)
That’s my theory at any rate. And it’s sorta what I think, I think.
Oh, also, I think that the counterproductive rot in environmentalism took at least 5 years to build up, probably, and I’m hopeful that therefore even if we are on a path to rot, it’ll take too long for the rot to build up to matter. But this is just a guess about the growth rate of rot which is informed by anecdotes like the stories about what happened to your friends, and over the coming months and years more data will be collected to better calibrate my guess about the rate of rot.
I appreciated seeing the caricature of 1 presented at least, as a dream, it feels attainable, or like it might have been in some other timeline, but perhaps in ours too, for all I know.
(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the “AI pause debate”, framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
AI safety becomes the single community that’s the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that’s the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There’s a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there’s a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they’re “worse than Hitler” (which happened to a friend of mine, actually). People get deontological about AI progress: some hesitate to pay for ChatGPT because it feels like they’re contributing to the problem (another true story); the dynamics around this look similar to environmentalists refusing to fly places. Others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you’re not depressed then you obviously don’t take it seriously enough. Just like environmentalists often block some of the most valuable work on fixing climate change (e.g. nuclear energy, geoengineering, land use reform), safety advocates block some of the most valuable work on alignment (e.g. scalable oversight, interpretability, adversarial training) due to acceleration or misuse concerns. Of course, nobody will say they want to dramatically slow down alignment research, but there will be such high barriers to researchers getting and studying the relevant models that it has similar effects. The regulations that end up being implemented are messy and full of holes, because the movement is more focused on making a big statement than figuring out the details.
Obviously I’ve exaggerated and caricatured these scenarios, but I think there’s an important point here. One really good thing about the AI safety movement, until recently, is that the focus on the problem of technical alignment has nudged it away from the second scenario (although it wasn’t particularly close to the first scenario either, because the “nerding out” was typically more about decision theory or agent foundations than ML itself). That’s changed a bit lately, in part because a bunch of people seem to think that making technical progress on alignment is hopeless. I think this is just not an epistemically reasonable position to take: history is full of cases where even leading experts dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems. Either way, I do think public advocacy for strong governance measures can be valuable, but I also think that “pause AI” advocacy runs the risk of pushing us towards scenario 2. Even if you think that’s a cost worth paying, I’d urge you to think about ways to get the benefits of the advocacy while reducing that cost and keeping the door open for scenario 1.
FYI I think this is worth fleshing out into a top level post (esp. given that it’s ‘Pause Debate’ week).
I’m not actually sure it needs much fleshing out. I think the main bit here that feels unjustified, or insufficiently-justified for the strength of the claim, is:
Have edited slightly to clarify that it was “leading experts” who dramatically underestimated it. I’m not really sure what else to say, though...
I think I basically agree with you Richard about the risks of falling into scenario 2, and think this is a wise comment, but I also think you are strawmanning the reason for the change—it’s not that people have come to think that making technical progress is hopeless (or even harder than it used to be!) it’s rather that people have come to have shorter timelines, and so the probability that sufficient technical progress will be made in time has gone down, and the usefulness of calling for a pause has gone up. (e.g. if you think AGI is 15 years away, then pausing now is plausibly useless or even harmful.)
That’s my theory at any rate. And it’s sorta what I think, I think.
Oh, also, I think that the counterproductive rot in environmentalism took at least 5 years to build up, probably, and I’m hopeful that therefore even if we are on a path to rot, it’ll take too long for the rot to build up to matter. But this is just a guess about the growth rate of rot which is informed by anecdotes like the stories about what happened to your friends, and over the coming months and years more data will be collected to better calibrate my guess about the rate of rot.
I appreciated seeing the caricature of 1 presented at least, as a dream, it feels attainable, or like it might have been in some other timeline, but perhaps in ours too, for all I know.