Ah, “The tool-assisted attack went from taking 13 minutes to taking 26 minutes per example.”
Interesting. Changing the in-distribution (3oom) does not influences much the out-distribution (*2)
I think that 3 orders of magnitude is the comparison between “time taken to find a failure by randomly sampling” and “time taken to find a failure if you are deliberately looking using tools.”
I read Oli’s comment as referring to the 2.4% → 0.002% failure rate improvement from filtering.
Ah, that makes sense. But the 26 minutes --> 13 minutes is from adversarial training holding the threshold fixed, right?
Indeed. (Well, holding the quality degradation fixed, which causes a small change in the threshold.)
Ah, “The tool-assisted attack went from taking 13 minutes to taking 26 minutes per example.”
Interesting. Changing the in-distribution (3oom) does not influences much the out-distribution (*2)
I think that 3 orders of magnitude is the comparison between “time taken to find a failure by randomly sampling” and “time taken to find a failure if you are deliberately looking using tools.”
I read Oli’s comment as referring to the 2.4% → 0.002% failure rate improvement from filtering.
Ah, that makes sense. But the 26 minutes --> 13 minutes is from adversarial training holding the threshold fixed, right?
Indeed. (Well, holding the quality degradation fixed, which causes a small change in the threshold.)