Vanessa Kosoy comments on Updating the Lottery Ticket Hypothesis

Vanessa Kosoy 18 Apr 2021 23:01 UTC
LW: 28 AF: 15
AF
IIUC, here’s a simple way to test this hypothesis: initialize a random neural network, and then find the minimal loss point in the tangent space. Since the tangent space is linear, this is easy to do (i.e. doesn’t require heuristic gradient descent): for square loss it’s just solving a large linear system once, for many other losses it should amount to convex optimization for which we have provable efficient algorithms. And, I guess it’s underdetermined so you add some regularization. Is the result about as good as normal gradient descent in the actual parameter space? I’m guessing some of the linked papers might have done something like this?
What links here?
- Exploring the Lottery Ticket Hypothesis by Rauno Arike (25 Apr 2023 20:06 UTC; 54 points)
- interstice 21 Apr 2021 18:50 UTC
  5 points
  Parent
  Yup, people have done this(taking the infinite-width limit at the same time): see here, here. Generally the kernels do worse than the original networks, but not by a lot. On the other hand, they’re usually applied to problems that aren’t super-hard, where non-neural-net classifiers already worked pretty well. And these models definitely can’t explain feature learning, since the functions computed by individual neurons don’t change at all during training.
- johnswentworth 18 Apr 2021 23:23 UTC
  LW: 3 AF: 3
  AF Parent
  This basically matches my current understanding. (Though I’m not strongly confident in my current understanding.) I believe the GP results are basically equivalent to this, but I haven’t read up on the topic enough to be sure.