From this paper, “Theoretical work limited to ReLU-type activation functions, showed that in overparameterized networks, all global minima lie in a connected manifold (Freeman & Bruna, 2016; Nguyen, 2019)”
So for overparameterized nets, the answer is probably:
There is only one solution manifold, so there are no separate basins. Every solution is connected.
We can salvage the idea of “basin volume” as follows:
In the dimensions perpendicular to the manifold, calculate the basin cross-section using the Hessian.
In the dimensions parallel to the manifold, ask “how can I move before it stops being the ‘same function’?”. If we define “sameness” as “same behavior on the validation set”,[1] then this means looking at the Jacobian of that behavior in the plane of the manifold.
Multiply the two hypervolumes to get the hypervolume of our “basin segment” (very roughly, the region of the basin which drains to our specific model)
From this paper, “Theoretical work limited to ReLU-type activation functions, showed that in overparameterized networks, all global minima lie in a connected manifold (Freeman & Bruna, 2016; Nguyen, 2019)”
So for overparameterized nets, the answer is probably:
There is only one solution manifold, so there are no separate basins. Every solution is connected.
We can salvage the idea of “basin volume” as follows:
In the dimensions perpendicular to the manifold, calculate the basin cross-section using the Hessian.
In the dimensions parallel to the manifold, ask “how can I move before it stops being the ‘same function’?”. If we define “sameness” as “same behavior on the validation set”,[1] then this means looking at the Jacobian of that behavior in the plane of the manifold.
Multiply the two hypervolumes to get the hypervolume of our “basin segment” (very roughly, the region of the basin which drains to our specific model)
There are other “sameness” measures which look at the internals of the model; I will be proposing one in an upcoming post.