Or are you saying that even if there’s only one model at the interpolation threshold that fits the data, you’d expect the training procedure to pick a different model (one that doesn’t completely fit the data) instead, because of the bias towards generalizability?
Yup, that.