catherio comments on Critch on career advice for junior AI-x-risk-concerned researchers

catherio 19 May 2018 3:21 UTC
17 points
FWIW, this claim doesn’t match my intuition, and googling around, I wasn’t able to quickly find any papers or blog posts supporting it.
“Explaining and Harnessing Adversarial Examples” (Goodfellow et al. 2014) is the original demonstration that “Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples”.
I’ll emphasize that high-dimensionality is a crucial piece of the puzzle, which I haven’t seen you bring up yet. You may already be aware of this, but I’ll emphasize it anyway: the usual intuitions do not even remotely apply in high-dimensional spaces. Check out Counterintuitive Properties of High Dimensional Space.
adversarial examples are only a thing because the wrong decision boundary has been learned
In my opinion, this is spot-on—not only your claim that there would be no adversarial examples if the decision boundary were perfect, but in fact a group of researchers are beginning to think that in a broader sense “adversarial vulnerability” and “amount of test set error” are inextricably linked in a deep and foundational way—that they may not even be two separate problems. Here are a few citations that point at some pieces of this case:
- “Adversarial Spheres” (Gilmer et al. 2017) - “For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/√d).” (emphasis mine)
  - I think this paper is truly fantastic in many respects.
  - The central argument can be understood from the intuitions presented in Counterintuitive Properties of High Dimensional Space in the section titled Concentration of Measure (Figure 9). Where it says “As the dimension increases, the width of the band necessary to capture 99% of the surface area decreases rapidly.” you can just replace that with the “As the dimension increases, a decision-boundary hyperplane that has 1% test error rapidly gets extremely close to the equator of the sphere”. “Small distance from the center of the sphere” is what gives rise to “Small epsilon at which you can find an adversarial example”.
- “Intriguing Properties of Adversarial Examples” (Cubuk et al. 2017) - “While adversarial accuracy is strongly correlated with clean accuracy, it is only weakly correlated with model size”
  - I haven’t read this paper, but I’ve heard good things about it.
To summarize, my belief is that any model that is trying to learn a decision boundary in a high-dimensional space, and is basically built out of linear units with some nonlinearities, will be susceptible to small-perturbation adversarial examples so long as it makes any errors at all.
(As a note—not trying to be snarky, just trying to be genuinely helpful, Cubuk et al. 2017 and Goodfellow et al. 2014 are my top two hits for “adversarial examples linearity” in an incognito tab)
- ESRogs 19 May 2018 7:36 UTC
  10 points
  Parent
  As the dimension increases, a decision-boundary hyperplane that has 1% test error rapidly gets extremely close to the equator of the sphere
  What does the center of the sphere represent in this case?
  (I’m imaging the training and test sets consisting of points in a highly dimensional space, and the classifier as drawing a hyperplane to mostly separate them from each other. But I’m not sure what point in this space would correspond to the “center”, or what sphere we’d be talking about.)
- ESRogs 19 May 2018 7:28 UTC
  10 points
  Parent
  The central argument can be understood from the intuitions presented in Counterintuitive Properties of High Dimensional Space in the section titled Concentration of Measure
  Thanks for this link, that is a handy reference!
- ESRogs 19 May 2018 7:05 UTC
  2 points
  Parent
  “Adversarial Spheres” (Gilmer et al. 2017) - “For this dataset we show a fundamental tradeoff between the amount of test error and the average distance to nearest error. In particular, we prove that any model which misclassifies a small constant fraction of a sphere will be vulnerable to adversarial perturbations of size O(1/√d).” (emphasis mine)
  Slightly off-topic, but quick terminology question. When I first read the abstract of this paper, I was very confused about what it was saying and had to re-read it several times, because of the way the word “tradeoff” was used.
  I usually think of a tradeoff as a inverse relationship between two good things that you want both of. But in this case they use “tradeoff” to refer to the inverse relationship between “test error”, and “average distance to nearest error”. Which is odd, because the first of those is bad and the second is good, no?
  Is there something I’m missing that causes this to sound like a more natural way of describing things to others’ ears?
- John_Maxwell 19 May 2018 5:56 UTC
  1 point
  Parent
  Thanks for the links! (That goes for Wei and Paul too.)
  
  a group of researchers are beginning to think that in a broader sense “adversarial vulnerability” and “amount of test set error” are inextricably linked in a deep and foundational way—that they may not even be two separate problems.
  
  I’d expect this to be true or false depending on the shape of the misclassified region. If you think of the input space as a white sheet, and the misclassified region as red polka dots, then we measure test error by throwing a dart at the sheet and checking if it hits a polka dot. To measure adversarial vulnerability, we take a dart that landed on a white part of the sheet and check the distance to the nearest red polka dot. If the sheet is covered in tiny red polka dots, this distance will be small on average. If the sheet has just a few big red polka dots, this will be larger on average, even if the total amount of red is the same.
  
  As a concrete example, suppose we trained a 1-nearest-neighbor classifier for 2-dimensional RGB images. Then the sheet is mostly red (because this is a terrible model), but there are splotches of white associated with each image in our training set. So this is a model that has lots of test error despite many spheres with 0% misclassifications.
  
  To measure the size of the polka dots, you could invert the typical adversarial perturbation procedure: Start with a misclassified input and find the minimal perturbation necessary to make it correctly classified.
  
  (It’s possible that this sheet analogy is misleading due to the nature of high-dimensional spaces.)
  
  Anyway, this relates back to the original topic of conversation: the extent to which capabilities research and safety research are separate. If “adversarial vulnerability” and “amount of test set error” are inextricably linked, that suggests that reducing test set error (“capabilities” research) improves safety, and addressing adversarial vulnerability (“safety” research) advances capabilities. The extreme version of this position is that software advances are all good and hardware advances are all bad.
  
  (As a note—not trying to be snarky, just trying to be genuinely helpful, Cubuk et al. 2017 and Goodfellow et al. 2014 are my top two hits for “adversarial examples linearity” in an incognito tab)
  
  Thanks. I’d seen both papers, but I don’t like linking to things I haven’t fully read.
  - ESRogs 19 May 2018 7:12 UTC
    6 points
    Parent
    Thanks. I’d seen both papers, but I don’t like linking to things I haven’t fully read.
    I might just be confused, but this sentence seems like a non sequitur to me. I understood catherio to be responding to your comment about googling and not finding “papers or blog posts supporting [the claim that deep learning is not unusually susceptible to adversarial examples]”.
    If that was already clear to you then, never mind. I was just confused why you were talking about linking to things, when before the question seemed to be about what could be found by googling.
    - John_Maxwell 19 May 2018 9:14 UTC
      5 points
      Parent
      Oh, that makes sense.