Rohin Shah comments on [AN #77]: Double descent: a unification of statistical theory and modern ML practice

Rohin Shah 19 Dec 2019 0:49 UTC
LW: 4 AF: 3
AF
Or are you saying that even if there’s only one model at the interpolation threshold that fits the data, you’d expect the training procedure to pick a different model (one that doesn’t completely fit the data) instead, because of the bias towards generalizability?
Yup, that.