Thanks Preetum. You’re right, I missed that note the first time—I edited my comment a bit.
It might be illuminating to say “the polynomial found by iterating weights starting at 0” instead of “the polynomial found with gradient descent”, since in this case, the inductive bias comes from the initialization point, not necessarily gradient descent per se. Neural nets can’t learn if all the weights are initialized to 0 at the start, of course :)
BTW, I tried switching from pseudoinverse to regularized linear regression, and the super high degree polynomials seemed more overfit to me.
Thanks Preetum. You’re right, I missed that note the first time—I edited my comment a bit.
It might be illuminating to say “the polynomial found by iterating weights starting at 0” instead of “the polynomial found with gradient descent”, since in this case, the inductive bias comes from the initialization point, not necessarily gradient descent per se. Neural nets can’t learn if all the weights are initialized to 0 at the start, of course :)
BTW, I tried switching from pseudoinverse to regularized linear regression, and the super high degree polynomials seemed more overfit to me.