Thanks for the comment! Naively I feel like dropout would make things worse for the reason that you mentioned and anti-dropout better, but I’m definitely not an expert on this stuff.
I’m not sure I totally understand your first idea. Is the idea something like
- Feed some images through a NN and record which neurons have high average activation on them
- Randomly pick some of those neurons and record which dataset examples cause them to have a high average activation
- Pick some subset of those images and iterate until convergence?
One interesting fact my group discovered is that dropout seems to increase the extent to which a network was modular. We have some results on the topic here but a more comprehensive paper should be out soon.
Dropout makes interpretation easier because it disincentivizes complicated features where you can only understand the function of the parts in terms of their high-order correlations with other parts. This is because if a feature relies on such correlations, it will be fragile to some of the pieces being dropped out.
Anti-dropout promotes consolidation of similar features into one, but it also incentivizes that one feature to be maximally complicated and fragile.
Re: first idea. Yeah, something like that. Basically just an attempt at formalization of “functionally similar neurons,” so that when you go to drop out a neuron, you actually drop out all functionally similar ones.
Thanks for the comment! Naively I feel like dropout would make things worse for the reason that you mentioned and anti-dropout better, but I’m definitely not an expert on this stuff.
I’m not sure I totally understand your first idea. Is the idea something like
- Feed some images through a NN and record which neurons have high average activation on them
- Randomly pick some of those neurons and record which dataset examples cause them to have a high average activation
- Pick some subset of those images and iterate until convergence?
One interesting fact my group discovered is that dropout seems to increase the extent to which a network was modular. We have some results on the topic here but a more comprehensive paper should be out soon.
Interesting, thanks! I stand corrected (and will read your paper)...
Dropout makes interpretation easier because it disincentivizes complicated features where you can only understand the function of the parts in terms of their high-order correlations with other parts. This is because if a feature relies on such correlations, it will be fragile to some of the pieces being dropped out.
Anti-dropout promotes consolidation of similar features into one, but it also incentivizes that one feature to be maximally complicated and fragile.
Re: first idea. Yeah, something like that. Basically just an attempt at formalization of “functionally similar neurons,” so that when you go to drop out a neuron, you actually drop out all functionally similar ones.