interstice comments on NTK/GP Models of Neural Nets Can’t Learn Features

interstice 7 May 2021 18:52 UTC
3 points
They would exist in a sufficiently big random NN, but their density would be extremely low I think. Like, if you train a normal neural net with a 15000 neurons and then there’s a car detector, the density of car detectors is now 1/15000. Whereas I think the density at initialization is probably more like 1/2^50 or something like that(numbers completely made up), so they’d have a negligible effect on the NTK’s learning ability(‘slight tweaks’ can’t happen in the NTK regime since no intermediate functions change by definition)
A difference with the pruning case is that the number of possible prunings increases exponentially with the number of neurons but the number of neurons is linear. My take on the LTH is that pruning is basically just a weird way of doing optimization so it’s not that surprising you can get good performance.
- johnswentworth 7 May 2021 19:05 UTC
  4 points
  Parent
  My take on the LTH is that pruning is basically just a weird way of doing optimization so it’s not that surprising you can get good performance.
  +1 to this in particular; I think this is the main point Daniel (and many people like Daniel) are missing here. There’s a very big difference between “car detector functions exist somewhere in the random jumble of a sufficiently big randomly initialized NN” vs “the net can be pruned to yield a car detector function”, and the LTH papers show the latter.
  - Daniel Kokotajlo 10 May 2021 10:52 UTC
    4 points
    Parent
    I think I get this distinction; I realize the NN papers show the latter; I guess our disagreement is about how big a deal / how surprising this is.

interstice comments on NTK/​GP Models of Neural Nets Can’t Learn Features

interstice comments on NTK/GP Models of Neural Nets Can’t Learn Features