gjm comments on Open Thread, April 27-May 4, 2014

gjm 28 Apr 2014 0:31 UTC
5 points

first two modes

I think you mean moments, not modes (here and twice more in the same paragraph). I mention this for the benefit of anyone reading this and googling for more information.

has no higher [moments]

I’m guessing you mean “has higher moments matching those of the normal distribution” or something, but I don’t see any advantage of this formulation over the simpler “is normally distributed” (or, since you’re talking about a dataset rather than the random process that generated it, something like “is drawn from a normal distribution”). Usually, saying something like “such-and-such a distribution has no fourth moment” means something very different (and incompatible with being normal): that its tails are fat enough that the fourth moment is undefined on account of the relevant integral being divergent.

There’s a deeper connection between means and normality. One of the reasons why you might summarize a random variable by its mean is that the mean minimizes the expected squared error: that is, if you’ve got a random variable X and you want to choose x so that E[(X-x)^2] is as small as possible, the correct choice for x is E[X], the mean of X. Or, if you have a dataset (x1,...,xn) and you want to choose x so that the mean of (xi-x)^2 is as small as possible, then the correct choice is the mean of the xi. OK, so why would you want to do that particular thing? Well, if your data are independent samples from a normal distribution, then minimizing the mean of (xi-x)^2 is the same thing as maximizing the likelihood (i.e., roughly, the probability of getting those samples rather than some other set of data). (Which is the same thing as maximizing the posterior probability, if you start out with no information about the mean of the distribution.) So for normally distributed data, choosing the mean of your sample gives you the same result as max likelihood. -- But if what you know, e.g., is that your data are drawn from a Cauchy distribution with unknown parameters, then taking the mean of the samples will not help you at all.