I wonder if this is a neural network thing, an SGD thing, or a both thing?
Neither, actually—it’s more general than that. Belkin et al. show that it happens even for simple models like decision trees. Also see here for an example with polynomial regression.
Are you aware of this work and the papers they cite?
Yeah, I am. I definitely think that stuff is good, though ideally I want something more than just “approximately K-complexity.”
Neither, actually—it’s more general than that. Belkin et al. show that it happens even for simple models like decision trees. Also see here for an example with polynomial regression.
Yeah, I am. I definitely think that stuff is good, though ideally I want something more than just “approximately K-complexity.”