Exciting thoughts here! One initial thought I have is that broadness might be able to be visualized with a topological map over the loss space, with some threshold of ‘statistically indistinguishable’ forming the areas between lines. Then the final loss would be in a ‘basin’ which would have an area, so that’d give a broadness metric.
Seems like a start. But I think one primary issue for imagining these basins is how high dimensional they are.
Note also that we’re not just looking for visualisations of the loss landscape here. Due to the correspondence between information loss and broadness outlined in Vivek’s linked post, we want to look at the nullspace of the space spanned by the gradients of the network output for individual data points.
EDIT: Gradient of network output, not gradient of the loss function, sorry. The gradient of the loss function is zero at perfect training loss.
Exciting thoughts here! One initial thought I have is that broadness might be able to be visualized with a topological map over the loss space, with some threshold of ‘statistically indistinguishable’ forming the areas between lines. Then the final loss would be in a ‘basin’ which would have an area, so that’d give a broadness metric.
Seems like a start. But I think one primary issue for imagining these basins is how high dimensional they are.
Note also that we’re not just looking for visualisations of the loss landscape here. Due to the correspondence between information loss and broadness outlined in Vivek’s linked post, we want to look at the nullspace of the space spanned by the gradients of the network output for individual data points.
EDIT: Gradient of network output, not gradient of the loss function, sorry. The gradient of the loss function is zero at perfect training loss.