johnswentworth comments on Neural Tangent Kernel Distillation

johnswentworth 7 Oct 2022 16:44 UTC
3 points
0
Ignore the part about training against a sinusoid. That was a more specific hypothesis, the symmetry thing is more general. Also ignore the part about “not changing the loss function”, since you’ve got the right math.
I’m a bit confused that you’re calling $y$ a label vector; shouldn’t it be shaped like a data pt? E.g. if I’m training an image classifier, that vector should be image-shaped. And then the typical symmetry we’d expect is that the kernel is (approximately) invariant to shifting the image left, right, up or down a pixel, and we could take any of those shifts to be R.
- Jeremy Gillen 10 Oct 2022 14:12 UTC
  3 points
  0
  Parent
  The eigenfunctions we are calculating are solutions to:
  $λ ϕ (x^{'}) = \int_{x \sim D} k (x^{'}, x) ϕ (x) d x$
  Where $D$ is the data distribution, $λ$ is an eigenvalue and $ϕ (x)$ is an eigenfunction.
  So the eigenfunction is a label function with input $x$ , a datapoint. The discrete approximation to it is a label vector, which I called $y$ above.