My ears perk up when I hear about approximations to basin size because it’s related to the Bayesian NN model of uncertainty.
Suppose you have a classifier that predicts a probability distribution over outputs. Then when we want the uncertainty of the weights, we just use Bayes’ rule, and because most of the terms don’t matter we mostly carte that P(weights | dataset) has evidence ratio proportional to P(dataset | weights). If you’re training on a predictive loss, your loss is basically the log of this P(dataset | weights), and so a linear weighting of probability turns into an exponential weighting of loss.
I.e. you end up (in theory that doesn’t always work) with a Boltzmann distribution sitting at the bottom of your loss basin (skewed by a regularization term). Broader loss basins directly translate to more uncertainty over weights.
Hm… But I guess thinking about this really just highlights for me the problems with the approximations used to get uncertainties out of the Bayesian NN picture. Knowing the learning coefficient is of limited use because, especially when some dimensions are different, you can’t really model all directions in weight-space as interchangeable and uncorrelated, so increased theoretical firepower doesn’t translate to better uncertainty estimates as nicely as I’d like.
Pretty neat.
My ears perk up when I hear about approximations to basin size because it’s related to the Bayesian NN model of uncertainty.
Suppose you have a classifier that predicts a probability distribution over outputs. Then when we want the uncertainty of the weights, we just use Bayes’ rule, and because most of the terms don’t matter we mostly carte that P(weights | dataset) has evidence ratio proportional to P(dataset | weights). If you’re training on a predictive loss, your loss is basically the log of this P(dataset | weights), and so a linear weighting of probability turns into an exponential weighting of loss.
I.e. you end up (in theory that doesn’t always work) with a Boltzmann distribution sitting at the bottom of your loss basin (skewed by a regularization term). Broader loss basins directly translate to more uncertainty over weights.
Hm… But I guess thinking about this really just highlights for me the problems with the approximations used to get uncertainties out of the Bayesian NN picture. Knowing the learning coefficient is of limited use because, especially when some dimensions are different, you can’t really model all directions in weight-space as interchangeable and uncorrelated, so increased theoretical firepower doesn’t translate to better uncertainty estimates as nicely as I’d like.