If I have a distribution of 2 kids and a professional boxer, and a random one is going to hit me, then argmax tells me that I will always be hit by a kids, sure if you draw from the distribution only once then argmax will beat the mean in 2⁄3 of the cases, but its much worse at answering what will happen if I draw 9 hits (argmax=nothing, mean=3hits from a boxer)
This distribution is skewed, like the beta distribution, and is therefore better summarized by the mean than the mode.
In Bayesian statistics argmax on sigma will often lead to sigma=0, if you assume that sigma follows a exponential distribution, thus it will lead you to assume that there is no variance in your sample
The variance is also lower around the mean than the mode if that counts as a theoretical justification :)
If I have a distribution of 2 kids and a professional boxer, and a random one is going to hit me, then argmax tells me that I will always be hit by a kids, sure if you draw from the distribution only once then argmax will beat the mean in 2⁄3 of the cases, but its much worse at answering what will happen if I draw 9 hits (argmax=nothing, mean=3hits from a boxer)
This distribution is skewed, like the beta distribution, and is therefore better summarized by the mean than the mode.
In Bayesian statistics argmax on sigma will often lead to sigma=0, if you assume that sigma follows a exponential distribution, thus it will lead you to assume that there is no variance in your sample
The variance is also lower around the mean than the mode if that counts as a theoretical justification :)