So if I understand correctly, your result is aiming at letting us estimate the dimensionality of the solution basins based on the gradients for the training examples at my local min/final model? Like, I just have to train my model, and then compute the Hessian/behavior gradients and I would (if everything you’re looking at works as intended) have a lot of information about the dimensionality of the basin (and I guess the modularity is what you’re aiming at here)? That would be pretty nice.
What other applications do you see for this result?
Each plane here is an n-1 dimensional manifold, where every model on that plane has the same output on input 1. They slice parameter space into n-1 dimensional regions. Each of these regions is an equivalence class of functions, which all behave about the same on input 1.
Are the 1-contour always connected? Is it something like you can continuously vary parameters but keeping the same output? Based on your illustration it would seem so, but it’s not obvious to me that you can always interpolate in model space between models with the same behavior.
However, if the contours are parallel:
Now the behavior manifolds are planes, running parallel to the contours. So we see here that parallel contours allow behavioral manifolds to have dimension>N−k.
I’m geometrically confused here: if the contours are parallel, then aren’t the behavior manifolds made by their intersection empty?
About the contours: While the graphic shows a finite number of contours with some spacing, in reality there are infinite contour planes and they completely fill space (as densely as the reals, if we ignore float precision). So at literally every point in space there is a blue contour, and a red one which exactly coincides with it.
Thanks for the post!
So if I understand correctly, your result is aiming at letting us estimate the dimensionality of the solution basins based on the gradients for the training examples at my local min/final model? Like, I just have to train my model, and then compute the Hessian/behavior gradients and I would (if everything you’re looking at works as intended) have a lot of information about the dimensionality of the basin (and I guess the modularity is what you’re aiming at here)? That would be pretty nice.
What other applications do you see for this result?
Are the 1-contour always connected? Is it something like you can continuously vary parameters but keeping the same output? Based on your illustration it would seem so, but it’s not obvious to me that you can always interpolate in model space between models with the same behavior.
I’m geometrically confused here: if the contours are parallel, then aren’t the behavior manifolds made by their intersection empty?
About the contours: While the graphic shows a finite number of contours with some spacing, in reality there are infinite contour planes and they completely fill space (as densely as the reals, if we ignore float precision). So at literally every point in space there is a blue contour, and a red one which exactly coincides with it.
I’ll reply to the rest of your comment later today when I have some time