The characterization of basin dimension here is super interesting. But it sounds like most of the framing is in terms of local minima. My understanding is that saddle points are much more likely in high dimensional landscapes (eg, see https://arxiv.org/abs/1406.2572) since there is likely always some direction leading to smaller loss.
How does your model complexity measure work for saddle points? The quotes below suggest there could be issues, although I imagine the measure makes sense as long as the weights are sampled around the saddle (and not falling into another basin).
Currently, if not applied at a local minimum, the estimator can sometimes yield unphysical negative model complexities.
This occurs when the sampler strays beyond its intended confines and stumbles across models with much lower loss than those in the desired neighborhood.
This is an open question. In practice it seems to work fine even at strict saddles (i.e. things where there are no negative eigenvalues in the Hessian but there are still negative directions, i.e. they show up at higher than second order in the Taylor series), in the sense that you can get sensible estimates and they indicate something about the way structure is developing, but the theory hasn’t caught up yet.
The characterization of basin dimension here is super interesting. But it sounds like most of the framing is in terms of local minima. My understanding is that saddle points are much more likely in high dimensional landscapes (eg, see https://arxiv.org/abs/1406.2572) since there is likely always some direction leading to smaller loss.
How does your model complexity measure work for saddle points? The quotes below suggest there could be issues, although I imagine the measure makes sense as long as the weights are sampled around the saddle (and not falling into another basin).
This is an open question. In practice it seems to work fine even at strict saddles (i.e. things where there are no negative eigenvalues in the Hessian but there are still negative directions, i.e. they show up at higher than second order in the Taylor series), in the sense that you can get sensible estimates and they indicate something about the way structure is developing, but the theory hasn’t caught up yet.