abramdemski comments on Updating the Lottery Ticket Hypothesis

abramdemski 20 Apr 2021 21:28 UTC
LW: 6 AF: 6
AF
Wait… so:
1. The tangent-space hypothesis implies something close to “gd finds a solution if and only if there’s already a dog detecting neuron” (for large networks, that is) -- specifically it seems to imply something pretty close to “there’s already a feature”, where “feature” means a linear combination of existing neurons within a single layer
2. gd in fact trains NNs to recognize dogs
3. Therefore, we’re still in the territory of “there’s already a dog detector”
...yeah?
- interstice 20 Apr 2021 22:40 UTC
  5 points
  Parent
  The tangent-space hypothesis implies something close to this but not quite—instead of ‘dog-detecting neuron’, it’s ‘parameter such that the partial derivative of the output with respect to that parameter, as a function of the input, implements a dog-detector’. This would include (the partial derivative w.r.t.) neurons via their bias.
- johnswentworth 20 Apr 2021 22:32 UTC
  LW: 3 AF: 2
  AF Parent
  Not quite. The linear expansion isn’t just over the parameters associated with one layer, it’s over all the parameters in the whole net.