The AI safety/alignment crowd was irrationally terrified of chatbots/current AI, forced everyone to pause, and then, unsurprisingly, didn’t find anything scary
The AI safety/alignment crowd need time to catch up their alignment techniques to keep up with the current models before things get dangerous in the future, and they did that
To point (1): alignment researchers aren’t terrified of GPT-4 taking over the world, wouldn’t agree to this characterization, and are not communicating this to others. I don’t expect this is how things will be interpreted if people are being fair.
I think (2) is the realistic spin, and could go wrong reputationally (like in the examples you showed) if there’s no interesting scientific alignment progress made in the pause-period. I don’t expect there to be a lack of interesting progress, though. There’s plenty of unexplored work in interpretability alone that could provide many low-hanging fruit results. This is something I naturally expect out of a young field with a huge space of unexplored empirical and theoretical questions. If there’s plenty of alignment research output during that time, then I’m not sure the pause will really be seen as a failure.
I’m in favor of interventions to try to change that aspect of our situation
Yeah, agree. I’d say one of the best ways to do this is to make it clear what the purpose of the pause is and defining what counts as the pause being a success (e.g. significant research output).
Also, your pro-pause points seem quite important, in my opinion, and outweigh the ‘reputational risks’ by a lot:
Pro-pause: It’s “practice for later”, “policy wins beget policy wins”, etc., so it will be easier next time
Pro-pause: Needless to say, maybe I’m wrong and LLMs won’t plateau!
I’d honestly find it a bit surprising if the reaction to this was to ignore future coordination for AI safety with a high probability. “Pausing to catch up alignment work” doesn’t seem like the kind of thing which leads the world to think “AI can never be existentially dangerous” and results in future coordination being harder. If AI keeps being more impressive than the SOTA now, I’m not really sure risk concerns will easily go away.
Maybe—I can see it being spun in two ways:
The AI safety/alignment crowd was irrationally terrified of chatbots/current AI, forced everyone to pause, and then, unsurprisingly, didn’t find anything scary
The AI safety/alignment crowd need time to catch up their alignment techniques to keep up with the current models before things get dangerous in the future, and they did that
To point (1): alignment researchers aren’t terrified of GPT-4 taking over the world, wouldn’t agree to this characterization, and are not communicating this to others. I don’t expect this is how things will be interpreted if people are being fair.
I think (2) is the realistic spin, and could go wrong reputationally (like in the examples you showed) if there’s no interesting scientific alignment progress made in the pause-period.
I don’t expect there to be a lack of interesting progress, though. There’s plenty of unexplored work in interpretability alone that could provide many low-hanging fruit results. This is something I naturally expect out of a young field with a huge space of unexplored empirical and theoretical questions. If there’s plenty of alignment research output during that time, then I’m not sure the pause will really be seen as a failure.
Yeah, agree. I’d say one of the best ways to do this is to make it clear what the purpose of the pause is and defining what counts as the pause being a success (e.g. significant research output).
Also, your pro-pause points seem quite important, in my opinion, and outweigh the ‘reputational risks’ by a lot:
Pro-pause: It’s “practice for later”, “policy wins beget policy wins”, etc., so it will be easier next time
Pro-pause: Needless to say, maybe I’m wrong and LLMs won’t plateau!
I’d honestly find it a bit surprising if the reaction to this was to ignore future coordination for AI safety with a high probability. “Pausing to catch up alignment work” doesn’t seem like the kind of thing which leads the world to think “AI can never be existentially dangerous” and results in future coordination being harder. If AI keeps being more impressive than the SOTA now, I’m not really sure risk concerns will easily go away.