IIRC this is probably the case for a broad range of non-NN models. I think the original Double Descent paper showed it for random Fourier features.
My current guess is that NN architectures are just especially affected by this, due to having even more degenerate behavioral manifolds, ranging very widely from tiny to large RLCTs.
IIRC this is probably the case for a broad range of non-NN models. I think the original Double Descent paper showed it for random Fourier features.
My current guess is that NN architectures are just especially affected by this, due to having even more degenerate behavioral manifolds, ranging very widely from tiny to large RLCTs.