I’m not sure how necessary that is. If you want diverse good solutions, that sounds a lot like ‘sampling from the posterior’, and we know thanks to Google burning a huge number of TPU-hours on true HMC-sampling from Bayesian neural networks that ‘deep ensembles’ (ie training multiple random initializations from scratch on the same dataset) actually provide you a pretty good sample from the posterior. If there are lots of equally decent ways to classify an image expressible in a NN, then the deep ensemble will sample from them (and that is presumably why ensembling improves: because they all are doing something different, instead of weighting the same features the same amount). If that’s not adequate, it’d be good to think about what one really wants instead, and how to build that in (maybe one wants to do data augmentation to erase color from one dataset/model and shapes from another, to encourage aventral-dorsal split or something).
You can also steer optimization to find ‘diverse’ models, like Ridge Rider: https://arxiv.org/abs/2011.06505
I’m not sure how necessary that is. If you want diverse good solutions, that sounds a lot like ‘sampling from the posterior’, and we know thanks to Google burning a huge number of TPU-hours on true HMC-sampling from Bayesian neural networks that ‘deep ensembles’ (ie training multiple random initializations from scratch on the same dataset) actually provide you a pretty good sample from the posterior. If there are lots of equally decent ways to classify an image expressible in a NN, then the deep ensemble will sample from them (and that is presumably why ensembling improves: because they all are doing something different, instead of weighting the same features the same amount). If that’s not adequate, it’d be good to think about what one really wants instead, and how to build that in (maybe one wants to do data augmentation to erase color from one dataset/model and shapes from another, to encourage a ventral-dorsal split or something).
Thanks! Very useful feedback.