johnswentworth comments on Understanding the Lottery Ticket Hypothesis

johnswentworth 14 May 2021 16:05 UTC
LW: 4 AF: 4
AF
Picture a linear approximation, like this:
The tangent space at point $a$ is that whole line labelled “tangent”.
The main difference between the tangent space and the space of neural-networks-for-which-the-weights-are-very-close is that the tangent space extrapolates the linear approximation indefinitely; it’s not just limited to the region near the original point. (In practice, though, that difference does not actually matter much, at least for the problem at hand—we do stay close to the original point.)
The reason we want to talk about “the tangent space” is that it lets us precisely state things like e.g. Newton’s method in terms of search: Newton’s method finds a point at which f(x) is approximately 0 by finding a point where the tangent space hits zero (i.e. where the line in the picture above hits the x-axis). So, the tangent space effectively specifies the “search objective” or “optimization objective” for one step of Newton’s method.
In the NTK/GP model, neural net training is functionally-identical to one step of Newton’s method (though it’s Newton’s method in many dimensions, rather than one dimension).
- philip_b 14 May 2021 20:19 UTC
  4 points
  Parent
  The tangent space at a point a is tangent to what manifold?
  - johnswentworth 14 May 2021 21:34 UTC
    4 points
    Parent
    I recommend just reading the math here. Leave a comment if it’s unclear after that.