Did you actually read the rest of that post? Because the entire point was to talk about ways iterative design fails other than fast takeoff and the standard deceptive alignment story.
I skimmed the rest, but it mostly seems to be about how particular alignment techniques (eg RLHF) may fail, or the difficulty/importance of measurement, which I probably don’t have much disagreement with. Also in general the evidence required to convince me of some core problem with iteration would be strictly enormous—as it is inherit to all evolutionary processes (biological or technological).
If we can’t detect the problems just by seeing what the system does, then iteration alone will not fix the problems, no matter how safe it is to iterate. In such cases, the key thing is to expand the range of problems we can detect.
Yes. Again (safe) iteration is necessary, but not sufficient. A wind tunnel isn’t a solution for areodynamic control; rather it’s a key enabling catalyst. You also need careful complete tests for alignment, various ways to measure it, etc.
I skimmed the rest, but it mostly seems to be about how particular alignment techniques (eg RLHF) may fail, or the difficulty/importance of measurement, which I probably don’t have much disagreement with. Also in general the evidence required to convince me of some core problem with iteration would be strictly enormous—as it is inherit to all evolutionary processes (biological or technological).
Yes. Again (safe) iteration is necessary, but not sufficient. A wind tunnel isn’t a solution for areodynamic control; rather it’s a key enabling catalyst. You also need careful complete tests for alignment, various ways to measure it, etc.