John_Maxwell comments on Critch on career advice for junior AI-x-risk-concerned researchers

John_Maxwell 19 May 2018 5:56 UTC
1 point
Thanks for the links! (That goes for Wei and Paul too.)

a group of researchers are beginning to think that in a broader sense “adversarial vulnerability” and “amount of test set error” are inextricably linked in a deep and foundational way—that they may not even be two separate problems.

I’d expect this to be true or false depending on the shape of the misclassified region. If you think of the input space as a white sheet, and the misclassified region as red polka dots, then we measure test error by throwing a dart at the sheet and checking if it hits a polka dot. To measure adversarial vulnerability, we take a dart that landed on a white part of the sheet and check the distance to the nearest red polka dot. If the sheet is covered in tiny red polka dots, this distance will be small on average. If the sheet has just a few big red polka dots, this will be larger on average, even if the total amount of red is the same.

As a concrete example, suppose we trained a 1-nearest-neighbor classifier for 2-dimensional RGB images. Then the sheet is mostly red (because this is a terrible model), but there are splotches of white associated with each image in our training set. So this is a model that has lots of test error despite many spheres with 0% misclassifications.

To measure the size of the polka dots, you could invert the typical adversarial perturbation procedure: Start with a misclassified input and find the minimal perturbation necessary to make it correctly classified.

(It’s possible that this sheet analogy is misleading due to the nature of high-dimensional spaces.)

Anyway, this relates back to the original topic of conversation: the extent to which capabilities research and safety research are separate. If “adversarial vulnerability” and “amount of test set error” are inextricably linked, that suggests that reducing test set error (“capabilities” research) improves safety, and addressing adversarial vulnerability (“safety” research) advances capabilities. The extreme version of this position is that software advances are all good and hardware advances are all bad.

(As a note—not trying to be snarky, just trying to be genuinely helpful, Cubuk et al. 2017 and Goodfellow et al. 2014 are my top two hits for “adversarial examples linearity” in an incognito tab)

Thanks. I’d seen both papers, but I don’t like linking to things I haven’t fully read.
- ESRogs 19 May 2018 7:12 UTC
  6 points
  Parent
  Thanks. I’d seen both papers, but I don’t like linking to things I haven’t fully read.
  I might just be confused, but this sentence seems like a non sequitur to me. I understood catherio to be responding to your comment about googling and not finding “papers or blog posts supporting [the claim that deep learning is not unusually susceptible to adversarial examples]”.
  If that was already clear to you then, never mind. I was just confused why you were talking about linking to things, when before the question seemed to be about what could be found by googling.
  - John_Maxwell 19 May 2018 9:14 UTC
    5 points
    Parent
    Oh, that makes sense.