I don’t think your first paragraph applies to the first three bullets you listed.
Leaders don’t even bother to ask researchers to leverage the company’s current frontier model to help in what is hopefully the company-wide effort to reduce risk from the ASI model that’s coming? That’s a leadership problem, not a lack of technical understanding problem. I suppose if you imagine that a company could get to fine-grained mechanical understanding of everything their early AGI model does then they’d be more likely to ask because they think it will be easier/faster? But we all know we’re almost certainly not going to have that understanding. Not asking would just be a leadership problem.
Leaders ask alignment team to safety-wash? Also a leadership problem.
Org can’t implement good alignment solutions their researchers devise? Again given that we all already know that we’re almost certainly not going to have comprehensive mechanical understanding of the early-AGI models, I don’t understand how shifts in the investment portfolio of technical AI safety research affects this? Still just seems a leadership problem unrelated to the percents next to each sub-field in the research investment portfolio.
Which leads me to your last paragraph. Why write a whole post against AI control in this context? Is your claim that there are sub-fields of technical AI safety research that are significantly less threatened by your 7 bullets that offer plausible minimization of catastrophic AI risk? That we shouldn’t bother with technical AI safety research at all? Something else?
This, along with @Andrew Mack’s MELBO and DCT work, is super cool and promising! One question, have you explored altering discovered vectors that make meaningful but non-gibberish changes to see if you can find something like a minimal viable direction? Perhaps something like taking successful vectors and then individually reoptimizing them turning down the L2 norm to see if some dimensions preferentially maintain their magnitude?