interstice comments on Updating the Lottery Ticket Hypothesis

interstice 21 Apr 2021 18:50 UTC
5 points
Yup, people have done this(taking the infinite-width limit at the same time): see here, here. Generally the kernels do worse than the original networks, but not by a lot. On the other hand, they’re usually applied to problems that aren’t super-hard, where non-neural-net classifiers already worked pretty well. And these models definitely can’t explain feature learning, since the functions computed by individual neurons don’t change at all during training.