Charlie Steiner comments on Transparency and AGI safety

Charlie Steiner 12 Jan 2021 11:34 UTC
LW: 5 AF: 3
AF
Thanks!

I’m reminded a bit of the reason why Sudoku and quantum computing are difficult: the possibilities you have to track are not purely local, they can be a nonlocal combination of different things. General NNs seem like they’d be at least NP to interpret.

But this is what dropout is useful for, penalizing reliance on correlations. So maybe if you’re having trouble interpreting something you can just crank up the dropout parameters. On the other hand, dropout also promotes redundancy, which might make interpretation confusing—perhaps there’s something similar to dropout that’s even better for interpretability.

Edit for unfiltered ideas:

You could automatically sample an image, find neurons excited, sample neurons, sample images based on how much they excite that neuron, etc, until you end up with a sampled pool of similar images and similar neurons. Then you drop out all similar neurons.

You could try anti-dropout: punishing the NN for redundancy and rewarding it for fragility/specificity. However, to avoid the incentive to create fine tuned activation/inhibition pairs, you only use positive activations for this step.
- jylin04 14 Jan 2021 14:45 UTC
  1 point
  Parent
  Thanks for the comment! Naively I feel like dropout would make things worse for the reason that you mentioned and anti-dropout better, but I’m definitely not an expert on this stuff.
  I’m not sure I totally understand your first idea. Is the idea something like
  - Feed some images through a NN and record which neurons have high average activation on them
  - Randomly pick some of those neurons and record which dataset examples cause them to have a high average activation
  - Pick some subset of those images and iterate until convergence?
  - DanielFilan 14 Jan 2021 18:20 UTC
    6 points
    Parent
    One interesting fact my group discovered is that dropout seems to increase the extent to which a network was modular. We have some results on the topic here but a more comprehensive paper should be out soon.
    - jylin04 14 Jan 2021 18:29 UTC
      1 point
      Parent
      Interesting, thanks! I stand corrected (and will read your paper)...
  - Charlie Steiner 15 Jan 2021 7:36 UTC
    5 points
    Parent
    Dropout makes interpretation easier because it disincentivizes complicated features where you can only understand the function of the parts in terms of their high-order correlations with other parts. This is because if a feature relies on such correlations, it will be fragile to some of the pieces being dropped out.
    
    Anti-dropout promotes consolidation of similar features into one, but it also incentivizes that one feature to be maximally complicated and fragile.
    
    Re: first idea. Yeah, something like that. Basically just an attempt at formalization of “functionally similar neurons,” so that when you go to drop out a neuron, you actually drop out all functionally similar ones.