I’m trying to get a quick intuition of this. I’ve not read the papers.
My attempt:
On a compact domain, any function can be uniformly approximated by a polynomial (Weierstrass)
Powers explode quickly, so you need many terms to make a nice function with a power series, to correct the high powers at the edges
As the domain gets larger, it is more difficult to make the approximation
So the relevant question is: how does the degree at training phase transition change with domain size, domain dimensionality, and Fourier series decay rate?
I’m trying to get a quick intuition of this. I’ve not read the papers.
My attempt:
On a compact domain, any function can be uniformly approximated by a polynomial (Weierstrass)
Powers explode quickly, so you need many terms to make a nice function with a power series, to correct the high powers at the edges
As the domain gets larger, it is more difficult to make the approximation
So the relevant question is: how does the degree at training phase transition change with domain size, domain dimensionality, and Fourier series decay rate?
Does this make sense?