Ignore the part about training against a sinusoid. That was a more specific hypothesis, the symmetry thing is more general. Also ignore the part about “not changing the loss function”, since you’ve got the right math.
I’m a bit confused that you’re calling y a label vector; shouldn’t it be shaped like a data pt? E.g. if I’m training an image classifier, that vector should be image-shaped. And then the typical symmetry we’d expect is that the kernel is (approximately) invariant to shifting the image left, right, up or down a pixel, and we could take any of those shifts to be R.
Ignore the part about training against a sinusoid. That was a more specific hypothesis, the symmetry thing is more general. Also ignore the part about “not changing the loss function”, since you’ve got the right math.
I’m a bit confused that you’re calling y a label vector; shouldn’t it be shaped like a data pt? E.g. if I’m training an image classifier, that vector should be image-shaped. And then the typical symmetry we’d expect is that the kernel is (approximately) invariant to shifting the image left, right, up or down a pixel, and we could take any of those shifts to be R.
The eigenfunctions we are calculating are solutions to:
λϕ(x′)=∫x∼Dk(x′,x)ϕ(x)dx
Where D is the data distribution, λ is an eigenvalue and ϕ(x) is an eigenfunction.
So the eigenfunction is a label function with input x, a datapoint. The discrete approximation to it is a label vector, which I called y above.