johnswentworth comments on Exploring the Lottery Ticket Hypothesis

johnswentworth 25 Apr 2023 21:12 UTC
9 points
2
One might expect $y = f (x, θ_{0} + Δ θ)$ to be a more expressive equation than its linear approximation, but it appears that the parameters of very large neural nets change only by a small amount during training, which means that the overall $Δ θ$ found during training is nearly a solution to the linearly-approximated equations.
Note that this has changed over time, as network architectures change; I doubt that it applies to e.g. the latest LLMs. The thing about pruning doing a whole bunch of optimization does still apply independent of whether net training is linear-ish (though I don’t know if anyone’s repro’d the lottery ticket hypothesis-driven pruning experiments on the past couple years’ worth of LLMs).
- Zach Furman 26 Apr 2023 2:16 UTC
  10 points
  7
  Parent
  A bit of a side note, but I don’t even think you need to appeal to new architectures—it looks like the NTK approximation performs substantially worse even with just regular MLPs (see this paper, among others).