DanielFilan comments on Does the lottery ticket hypothesis suggest the scaling hypothesis?

DanielFilan 24 Apr 2021 18:21 UTC
LW: 10 AF: 7
AF
Oh here’s where I think things went wrong:

Part of why I think the two tickets are the same is that the at-initialization ticket is found by taking the after-training ticket and rewinding it to the beginning!

This is true in the original LTH paper, but there the “at-initialization ticket” doesn’t actually perform well: it’s just easy to train to high performance.

In the multi-prize LTH paper, it is the case that the “at-initialization ticket” performs well, but they don’t find it by winding back the weights of a trained pruned network.

If you got multi-prize at-initialization tickets by winding back the weights of a trained pruned network, I would find that pretty convincing—the idea that they’d be totally different networks would seem like too much of a coincidence. But I would still want to actually check whether the weights were actually the same (which funnily enough isn’t trivial if you’re not familiar with a little-discussed symmetry of DNNs: for a hidden layer neuron with a ReLU activation function, you can scale the input weights up by a positive constant and the output weights down by the same constant without changing the functioning of the network).
What links here?
- Exploring the Lottery Ticket Hypothesis by Rauno Arike (25 Apr 2023 20:06 UTC; 54 points)
- Daniel Kokotajlo 25 Apr 2021 13:10 UTC
  LW: 2 AF: 2
  AF Parent
  OH this indeed changes everything (about what I had been thinking) thank you! I shall have to puzzle over these ideas some more then, and probably read the multi-prize paper more closely (I only skimmed it earlier)
  - DanielFilan 26 Apr 2021 22:26 UTC
    LW: 4 AF: 3
    AF Parent
    Ah to be clear I am entirely basing my comments off of reading the abstracts (and skimming the multi-prize paper with an eye one develops after having been a ML PhD student for mumbles indistinctly years).