The “explain why that strategy wouldn’t work” step specifically takes the form of “describing a way the world could be where that strategy demonstrably doesn’t work” (rather than more heuristic arguments).
Once we have a proposal where we try really hard to come up with situations where it could demonstrably fail, and can’t think of any, we will probably need to do lots of empirical work to figure out if we can implement it and if it actually works in practice. But we hope that this exercise will teach us a lot about the nature of the empirical work we’ll need to do, as well as providing more confidence that the strategy will generalize beyond what we are able to test in practice. (For example, ELK was highlighted as a problem in the first place after ARC researchers thought a lot about possible failure modes of iterated amplification.)
This broadly seems right. Some details:
The “explain why that strategy wouldn’t work” step specifically takes the form of “describing a way the world could be where that strategy demonstrably doesn’t work” (rather than more heuristic arguments).
Once we have a proposal where we try really hard to come up with situations where it could demonstrably fail, and can’t think of any, we will probably need to do lots of empirical work to figure out if we can implement it and if it actually works in practice. But we hope that this exercise will teach us a lot about the nature of the empirical work we’ll need to do, as well as providing more confidence that the strategy will generalize beyond what we are able to test in practice. (For example, ELK was highlighted as a problem in the first place after ARC researchers thought a lot about possible failure modes of iterated amplification.)