Okay I think I get what you’re saying now—more SGD steps should increase “effective model capacity”, so per the double descent intuition we should expect the validation loss to first increase then decrease (as is indeed observed). Is that right?
Okay I think I get what you’re saying now—more SGD steps should increase “effective model capacity”, so per the double descent intuition we should expect the validation loss to first increase then decrease (as is indeed observed). Is that right?