Seems like a start. But I think one primary issue for imagining these basins is how high dimensional they are.
Note also that we’re not just looking for visualisations of the loss landscape here. Due to the correspondence between information loss and broadness outlined in Vivek’s linked post, we want to look at the nullspace of the space spanned by the gradients of the network output for individual data points.
EDIT: Gradient of network output, not gradient of the loss function, sorry. The gradient of the loss function is zero at perfect training loss.
Seems like a start. But I think one primary issue for imagining these basins is how high dimensional they are.
Note also that we’re not just looking for visualisations of the loss landscape here. Due to the correspondence between information loss and broadness outlined in Vivek’s linked post, we want to look at the nullspace of the space spanned by the gradients of the network output for individual data points.
EDIT: Gradient of network output, not gradient of the loss function, sorry. The gradient of the loss function is zero at perfect training loss.