My take on the LTH is that pruning is basically just a weird way of doing optimization so it’s not that surprising you can get good performance.
+1 to this in particular; I think this is the main point Daniel (and many people like Daniel) are missing here. There’s a very big difference between “car detector functions exist somewhere in the random jumble of a sufficiently big randomly initialized NN” vs “the net can be pruned to yield a car detector function”, and the LTH papers show the latter.
+1 to this in particular; I think this is the main point Daniel (and many people like Daniel) are missing here. There’s a very big difference between “car detector functions exist somewhere in the random jumble of a sufficiently big randomly initialized NN” vs “the net can be pruned to yield a car detector function”, and the LTH papers show the latter.
I think I get this distinction; I realize the NN papers show the latter; I guess our disagreement is about how big a deal / how surprising this is.