To make a useful version of this post I think you need to get quantitative.
I think we should try to slow down the development of unsafe AI. And so all else equal I think it’s worth picking research topics that accelerate capabilities as little as possible. But it just doesn’t seem like a major consideration when picking safety projects. (In this comment I’ll just address that; I also think the case in the OP is overstated for the more plausible claim that safety-motivated researchers shouldn’t work directly on accelerating capabilities.)
A simple way to model the situation is:
There are A times as many researchers who work on existential safety as there are total researchers in AI. I think a very conservative guess is something like A = 1% or 10% depending on how broadly you define it; a more sophisticated version of this argument would focus on more narrow types of safety work.
If those researchers don’t worry about capabilities externalities when they choose safety projects, they accelerate capabilities by about B times as much as if they had focused on capabilities directly. I think a plausible guess for this is like B = 10%.
The badness of accelerating capabilities by 1 month is C times larger than the goodness of accelerating safety research by 1 month. I think a reasonable conservative guess for this today is like 10. It depends on how much other “good stuff” you think is happening in the world that is similarly important to safety progress on reducing the risk from AI—are there 2 other categories, or 10 other categories, or 100 other categories? This number will go up as the broader world starts responding and adapting to AI more; I think in the past it was much less than 10 because there just wasn’t that much useful preparation happening.
The ratio of (capabilities externalities) / (safety impact) is something like A x B xC, which is like 1-10% of the value of safety work. This suggests that capabilities externalities can be worth thinking about.
There are lots of other corrections which I think generally point further in the direction of not worrying. For example, in this estimate we said that AI safety research is 10% of all the important safety-improving-stuff happening in the world whereas we implicitly assumed that capabilities research is 100% of all the timelines-shortening-stuff happening in the world, whereas I think that if we consider those two questions symmetrically I would argue that capabilities research is <30% of all the timelines-shortening-stuff (including e.g. compute progress, experience with products, human capital growth...). So my personal estimate for safety externalities is more like 1% of cost of safety progress.
And on top of that if you totally avoid 30% of possible safety topics, the total cost is much larger than reducing safety output by 30%---generally there are diminishing returns and stepping on toes effects and so on. I’d guess it’s at least 2-4x worse than just scaling down everything.
There are lots of ways you could object to this, but ultimately it seems to come down to quantitative claims:
My sense is that the broader safety community considers parameter B is much larger than I do, and this is the most substantive and interesting disagreement. People seem to have an estimate more like 100%, and when restricting t “good alignment work” they often seem to think more like 10x. Overall, if I believed B=1 then it becomes worth thinking about and if you take B=10 then it becomes a major consideration in project choice.
In this post Nate explains his view that alignment progress is serial while capabilities progress is parallelizable, but I disagree and he doesn’t really present any evidence or argument. Of course you could list other considerations not in the simple model that you think are important.
I think the most common way is just to think that the supposed safety progress isn’t actually delivering safety. But then “this doesn’t help” is really the crux—the work would be bad even if it weren’t net negative. Arguing about net negativity is really burying the lede.
To make a useful version of this post I think you need to get quantitative.
I think we should try to slow down the development of unsafe AI. And so all else equal I think it’s worth picking research topics that accelerate capabilities as little as possible. But it just doesn’t seem like a major consideration when picking safety projects. (In this comment I’ll just address that; I also think the case in the OP is overstated for the more plausible claim that safety-motivated researchers shouldn’t work directly on accelerating capabilities.)
A simple way to model the situation is:
There are A times as many researchers who work on existential safety as there are total researchers in AI. I think a very conservative guess is something like A = 1% or 10% depending on how broadly you define it; a more sophisticated version of this argument would focus on more narrow types of safety work.
If those researchers don’t worry about capabilities externalities when they choose safety projects, they accelerate capabilities by about B times as much as if they had focused on capabilities directly. I think a plausible guess for this is like B = 10%.
The badness of accelerating capabilities by 1 month is C times larger than the goodness of accelerating safety research by 1 month. I think a reasonable conservative guess for this today is like 10. It depends on how much other “good stuff” you think is happening in the world that is similarly important to safety progress on reducing the risk from AI—are there 2 other categories, or 10 other categories, or 100 other categories? This number will go up as the broader world starts responding and adapting to AI more; I think in the past it was much less than 10 because there just wasn’t that much useful preparation happening.
The ratio of (capabilities externalities) / (safety impact) is something like A x B x C, which is like 1-10% of the value of safety work. This suggests that capabilities externalities can be worth thinking about.
There are lots of other corrections which I think generally point further in the direction of not worrying. For example, in this estimate we said that AI safety research is 10% of all the important safety-improving-stuff happening in the world whereas we implicitly assumed that capabilities research is 100% of all the timelines-shortening-stuff happening in the world, whereas I think that if we consider those two questions symmetrically I would argue that capabilities research is <30% of all the timelines-shortening-stuff (including e.g. compute progress, experience with products, human capital growth...). So my personal estimate for safety externalities is more like 1% of cost of safety progress.
And on top of that if you totally avoid 30% of possible safety topics, the total cost is much larger than reducing safety output by 30%---generally there are diminishing returns and stepping on toes effects and so on. I’d guess it’s at least 2-4x worse than just scaling down everything.
There are lots of ways you could object to this, but ultimately it seems to come down to quantitative claims:
My sense is that the broader safety community considers parameter B is much larger than I do, and this is the most substantive and interesting disagreement. People seem to have an estimate more like 100%, and when restricting t “good alignment work” they often seem to think more like 10x. Overall, if I believed B=1 then it becomes worth thinking about and if you take B=10 then it becomes a major consideration in project choice.
In this post Nate explains his view that alignment progress is serial while capabilities progress is parallelizable, but I disagree and he doesn’t really present any evidence or argument. Of course you could list other considerations not in the simple model that you think are important.
I think the most common way is just to think that the supposed safety progress isn’t actually delivering safety. But then “this doesn’t help” is really the crux—the work would be bad even if it weren’t net negative. Arguing about net negativity is really burying the lede.