Ok, coming back to this a few hours later, here is roughly what feels off to me. I think the key thing is that the priority should be “developing measurement tools that tell us with high confidence whether systems are benign”, but Miles somehow frames this as “measurement tools that tell us with high confidence that AI systems are benign”, which sounds confused in the same way that a scientist saying “what we need are experiments that confirm my theory” sounds kind of confused (I think there are important differences between bayesian and scientific evidence, and a scientist can have justified confidence before the scientific community will accept their theory, but still, something seems wrong when a scientist is saying the top priority is to develop experiments that confirm their theory, as opposed to “experiments that tell us whether my theory is right”).
“Generate evidence of difficulty” as a research purpose
How to handle the problem of AI risk is one of, if not the most important and consequential strategic decisions facing humanity. If we err in the direction of too much caution, in the short run resources are diverted into AI safety projects that could instead go to other x-risk efforts, and in the long run, billions of people could unnecessarily die while we hold off on building “dangerous” AGI and wait for “safe” algorithms to come along. If we err in the opposite direction, well presumably everyone here already knows the downside there.
A crucial input into this decision is the difficulty of AI safety, and the obvious place for decision makers to obtain evidence about the difficulty of AI safety is from technical AI safety researchers (and AI researchers in general), but it seems that not many people have given much thought on how to optimize for the production and communication of such evidence (leading to communication gaps like this one). (As another example, many people do not seem to consider that doing research on a seemingly intractably difficult problem can be valuable because it can at least generate evidence of difficulty of that particular line of research.)
But I do think that section of the post handles the tradeoff a lot better, and gives me a lot less of the “something is off” vibes.
Ok, coming back to this a few hours later, here is roughly what feels off to me. I think the key thing is that the priority should be “developing measurement tools that tell us with high confidence whether systems are benign”, but Miles somehow frames this as “measurement tools that tell us with high confidence that AI systems are benign”, which sounds confused in the same way that a scientist saying “what we need are experiments that confirm my theory” sounds kind of confused (I think there are important differences between bayesian and scientific evidence, and a scientist can have justified confidence before the scientific community will accept their theory, but still, something seems wrong when a scientist is saying the top priority is to develop experiments that confirm their theory, as opposed to “experiments that tell us whether my theory is right”).
Edit: There is a similar post by Wei Dai a long time ago that I had some similar feelings about: https://www.lesswrong.com/posts/dt4z82hpvvPFTDTfZ/six-ai-risk-strategy-ideas#_Generate_evidence_of_difficulty__as_a_research_purpose
But I do think that section of the post handles the tradeoff a lot better, and gives me a lot less of the “something is off” vibes.