What I’m suggesting is that volume in high-dimensions can concentrate on the boundary.
Yes. I imagine this is why overtraining doesn’t make a huge difference.
Falsifiable Hypothesis: Compare SGD with overtaining to the random sampling algorithm. You will see that functions that are unlikely to be generated by random sampling will be more likely under SGD with overtraining. Moreover, functions that are more likely with random sampling will be become less likely under SGD with overtraining.
Yes. I imagine this is why overtraining doesn’t make a huge difference.
See e.g., page 47 in the main paper.
[Deleted]