As a community, I agree it’s important to have a continuous discussion on how to best shape research effort and public advocacy to maximally reduce X-risk. But this post feels off the mark to me. Consider your bullet list of sources of risk not alleviated by AI control. You proffer that your list makes up a much larger portion of the probability space than misalignment or deception. This is the pivot point in your decision to not support investing research resources in AI control.
You list seven things. The first three aren’t addressable by any technical research or solution. Corporate leaders might be greedy, hubristic, and/or reckless. Or human organizations might not be nimble enough to effect development and deployment of the maximum safety we are technically capable of. No safety research portfolio addresses those risks. The other four are potential failures by us as a technical community that apply broadly. If too high a percentage of the people in our space are bad statisticians, can’t think distributionally, are lazy or prideful, or don’t understand causal reasoning well enough, that will doom all potential directions of AI safety research, not just AI control.
So in my estimation, if we grant that it is important to do AI safety research, we can ignore the entire list in the context of estimating the value of AI control versus other investment of research resources. You don’t argue against the proposition that controlling misalignment of early AGI is a plausible path to controlling ASI. You gesture at it to start but then don’t argue against the plausibility of the path of AI control of early AGI allowing collaboration to effect safe ASI through some variety of methods. You just list a bunch of—entirely plausible and I agree not unlikely—reasons why all attempts at preventing catastrophic risk might fail. I don’t understand why you single out AI control versus other technical AI safety sub-fields that are subject to all the same failure modes?
I don’t think your first paragraph applies to the first three bullets you listed.
Leaders don’t even bother to ask researchers to leverage the company’s current frontier model to help in what is hopefully the company-wide effort to reduce risk from the ASI model that’s coming? That’s a leadership problem, not a lack of technical understanding problem. I suppose if you imagine that a company could get to fine-grained mechanical understanding of everything their early AGI model does then they’d be more likely to ask because they think it will be easier/faster? But we all know we’re almost certainly not going to have that understanding. Not asking would just be a leadership problem.
Leaders ask alignment team to safety-wash? Also a leadership problem.
Org can’t implement good alignment solutions their researchers devise? Again given that we all already know that we’re almost certainly not going to have comprehensive mechanical understanding of the early-AGI models, I don’t understand how shifts in the investment portfolio of technical AI safety research affects this? Still just seems a leadership problem unrelated to the percents next to each sub-field in the research investment portfolio.
Which leads me to your last paragraph. Why write a whole post against AI control in this context? Is your claim that there are sub-fields of technical AI safety research that are significantly less threatened by your 7 bullets that offer plausible minimization of catastrophic AI risk? That we shouldn’t bother with technical AI safety research at all? Something else?