johnswentworth comments on Updating the Lottery Ticket Hypothesis

johnswentworth 20 Apr 2021 18:09 UTC
LW: 4 AF: 4
AF
Yeah, I agree that something more general than one neuron but less general than (or at least different from) pruning might be appropriate. I’m not particularly worried about where that line “should” be drawn a priori, because the tangent space indeed seems like the right place to draw the line empirically.
- abramdemski 20 Apr 2021 21:28 UTC
  LW: 6 AF: 6
  AF Parent
  Wait… so:
  1. The tangent-space hypothesis implies something close to “gd finds a solution if and only if there’s already a dog detecting neuron” (for large networks, that is) -- specifically it seems to imply something pretty close to “there’s already a feature”, where “feature” means a linear combination of existing neurons within a single layer
  2. gd in fact trains NNs to recognize dogs
  3. Therefore, we’re still in the territory of “there’s already a dog detector”
  ...yeah?
  - interstice 20 Apr 2021 22:40 UTC
    5 points
    Parent
    The tangent-space hypothesis implies something close to this but not quite—instead of ‘dog-detecting neuron’, it’s ‘parameter such that the partial derivative of the output with respect to that parameter, as a function of the input, implements a dog-detector’. This would include (the partial derivative w.r.t.) neurons via their bias.
  - johnswentworth 20 Apr 2021 22:32 UTC
    LW: 3 AF: 2
    AF Parent
    Not quite. The linear expansion isn’t just over the parameters associated with one layer, it’s over all the parameters in the whole net.