gwern comments on Thought Experiments Provide a Third Anchor

gwern 18 Jan 2022 22:10 UTC
10 points

However, remember how susceptible current DL models can be to adversarial examples, even when the adversarial examples have no perceptible difference to non-adversarial examples as fas as humans can tell. That means that something is going on in DL systems that is qualitatively much different from how human brains process information. Something that makes them fragile in a way that is hard to anthropomorphize. Something alien.

That is highly debatable. There has been work on constructing adversarial examples for human brains, and some interesting demonstrations of considerable neural-level control even with our extremely limited ability to observe brains (ie. far short of ‘know every single parameter in the network exactly and are able to calculate exact network-wide gradients for it or a similar network’), and theoretical work arguing that adversarial examples are only due to the most obvious way that current DL models differ from human brains—being much, much, much smaller.
- Jon Garcia 19 Jan 2022 3:01 UTC
  1 point
  Parent
  
  There has been work on constructing adversarial examples for human brains, and some interesting demonstrations of considerable neural-level control even with our extremely limited ability to observe brains
  
  Do you have a source for this? I would be interested in looking into it. I could see this happening for isolated neurons, at least, but it would be curious if it could happen for whole circuits in vivo.
  
  Does this go beyond just manipulating how our brains process optical illusions? I don’t see how the brain would perceive the type of pixel-level adversarial perturbations most of us think of (e.g.: https://openai.com/blog/adversarial-example-research/) as anything other than noise, if it even reaches past the threshold of perception at all. The sorts of illusions humans fall prey to are qualitatively different, taking advantage of our perceptual assumptions like structural continuity or color shifts under changing lighting conditions or 3-dimensionality. We don’t tend to go from making good guesses about what something is to being wildly, confidently incorrect when the texture changes microscopically.
  
  My guess would be that you could get rid of a lot of adversarial susceptibility from DL systems by adding in the right kind of recurrent connectivity (as in predictive coding, where hypotheses about what the network is looking at help it to interpret low-level features), or even by finding a less extremizing nonlinearity than ReLU (e.g.: https://towardsdatascience.com/neural-networks-an-alternative-to-relu-2e75ddaef95c). Such changes might get us closer to how the brain does things.
  
  Overparameterization, such as through making the network arbitrarily deep, might be able to get you around some of these limitations eventually (just like a fully connected NN can do the same thing as a CNN in principle), but I think we’ll have to change how we design neural networks at a fundamental level in order to avoid these issues more effectively in the long term.
  - gwern 19 Jan 2022 3:27 UTC
    6 points
    Parent
    Look through https://www.gwern.net/docs/ai/adversarial/index The theoretical work is the isoperimetry paper: https://arxiv.org/abs/2105.12806
    
    I don’t see how the brain would perceive the type of pixel-level adversarial perturbations most of us think of (e.g.: https://openai.com/blog/adversarial-example-research/) as anything other than noise, if it even reaches past the threshold of perception at all.
    
    Here is a paper showing that humans can classify pixel-level adversarial examples that look like noise at better than chance levels, see Experiment 4 (and also #5-6): https://www.nature.com/articles/s41467-019-08931-6
    - Jon Garcia 19 Jan 2022 3:45 UTC
      3 points
      Parent
      Thanks for the links!