I still think it’s not a great choice, though clearly my other choices haven’t worked well. But please do try it.
Given that the probability is a continuous distribution, the Fisher information might instead be a reasonable thing to look at. For a single distribution, maximizing it corresponds to minimizing the variance, so my suggestion for that wasn’t as ad-hoc as I thought. I’m not sure the equivalence holds for multiple distributions.
I still think it’s not a great choice, though clearly my other choices haven’t worked well. But please do try it.
Given that the probability is a continuous distribution, the Fisher information might instead be a reasonable thing to look at. For a single distribution, maximizing it corresponds to minimizing the variance, so my suggestion for that wasn’t as ad-hoc as I thought. I’m not sure the equivalence holds for multiple distributions.