After a contribution to a previous thread I thought some more about what I actually wanted to say, so here is a much more succint version:
The average of any distribution or even worse of a dataset is not a sufficient description without a statement about the distribution.
So often research results are reported as a simple average with a standard deviation. The educated statistician will recognise these two numbers as the first two modes of a distribution. But these two modes completely describe a distribution if it is a normal distribution. Though the central limit theorem gives us justification to use it in quite a number of cases, in general we need to make sure that the dataset has no higher modes. The most obvious case is of a dataset dominated by a single binary random variable.
This statement then, that not all datasets are normally distributed, holds for any field, be it solid state physics, astrophysics, biochemistry, evolutionary biology, population ecology, welfare economics or psychology. To assume that any average together with a standard deviation derives from a normal distribution or even worse that there is no more information in the dataset or the underlying phenomenon is a grave scientific mistake.
I think you mean moments, not modes (here and twice more in the same paragraph). I mention this for the benefit of anyone reading this and googling for more information.
has no higher [moments]
I’m guessing you mean “has higher moments matching those of the normal distribution” or something, but I don’t see any advantage of this formulation over the simpler “is normally distributed” (or, since you’re talking about a dataset rather than the random process that generated it, something like “is drawn from a normal distribution”). Usually, saying something like “such-and-such a distribution has no fourth moment” means something very different (and incompatible with being normal): that its tails are fat enough that the fourth moment is undefined on account of the relevant integral being divergent.
There’s a deeper connection between means and normality. One of the reasons why you might summarize a random variable by its mean is that the mean minimizes the expected squared error: that is, if you’ve got a random variable X and you want to choose x so that E[(X-x)^2] is as small as possible, the correct choice for x is E[X], the mean of X. Or, if you have a dataset (x1,...,xn) and you want to choose x so that the mean of (xi-x)^2 is as small as possible, then the correct choice is the mean of the xi. OK, so why would you want to do that particular thing? Well, if your data are independent samples from a normal distribution, then minimizing the mean of (xi-x)^2 is the same thing as maximizing the likelihood (i.e., roughly, the probability of getting those samples rather than some other set of data). (Which is the same thing as maximizing the posterior probability, if you start out with no information about the mean of the distribution.) So for normally distributed data, choosing the mean of your sample gives you the same result as max likelihood. -- But if what you know, e.g., is that your data are drawn from a Cauchy distribution with unknown parameters, then taking the mean of the samples will not help you at all.
The educated statistician will recognise these two numbers as the first two modes of a distribution. But these two modes completely describe a distribution if and only if it is a normal distribution.
(The “only if” is incorrect. For many other families of distributions, knowing mean and variance is also sufficient to pinpoint a unique distribution.)
After a contribution to a previous thread I thought some more about what I actually wanted to say, so here is a much more succint version:
The average of any distribution or even worse of a dataset is not a sufficient description without a statement about the distribution.
So often research results are reported as a simple average with a standard deviation. The educated statistician will recognise these two numbers as the first two modes of a distribution. But these two modes completely describe a distribution if it is a normal distribution. Though the central limit theorem gives us justification to use it in quite a number of cases, in general we need to make sure that the dataset has no higher modes. The most obvious case is of a dataset dominated by a single binary random variable.
This statement then, that not all datasets are normally distributed, holds for any field, be it solid state physics, astrophysics, biochemistry, evolutionary biology, population ecology, welfare economics or psychology. To assume that any average together with a standard deviation derives from a normal distribution or even worse that there is no more information in the dataset or the underlying phenomenon is a grave scientific mistake.
I think you mean moments, not modes (here and twice more in the same paragraph). I mention this for the benefit of anyone reading this and googling for more information.
I’m guessing you mean “has higher moments matching those of the normal distribution” or something, but I don’t see any advantage of this formulation over the simpler “is normally distributed” (or, since you’re talking about a dataset rather than the random process that generated it, something like “is drawn from a normal distribution”). Usually, saying something like “such-and-such a distribution has no fourth moment” means something very different (and incompatible with being normal): that its tails are fat enough that the fourth moment is undefined on account of the relevant integral being divergent.
There’s a deeper connection between means and normality. One of the reasons why you might summarize a random variable by its mean is that the mean minimizes the expected squared error: that is, if you’ve got a random variable X and you want to choose x so that E[(X-x)^2] is as small as possible, the correct choice for x is E[X], the mean of X. Or, if you have a dataset (x1,...,xn) and you want to choose x so that the mean of (xi-x)^2 is as small as possible, then the correct choice is the mean of the xi. OK, so why would you want to do that particular thing? Well, if your data are independent samples from a normal distribution, then minimizing the mean of (xi-x)^2 is the same thing as maximizing the likelihood (i.e., roughly, the probability of getting those samples rather than some other set of data). (Which is the same thing as maximizing the posterior probability, if you start out with no information about the mean of the distribution.) So for normally distributed data, choosing the mean of your sample gives you the same result as max likelihood. -- But if what you know, e.g., is that your data are drawn from a Cauchy distribution with unknown parameters, then taking the mean of the samples will not help you at all.
(The “only if” is incorrect. For many other families of distributions, knowing mean and variance is also sufficient to pinpoint a unique distribution.)
I must have mixed it up with some other statement.
“Yeah, sorry I said something that was incorrect. I meant to say something that wasn’t incorrect.”
I’ve seen more ballsy responses than this, but not many.
I don’t understand. Metus flatly admitted error, end of story.
For clarity, I found what Metus said to be very funny. I commented because I wanted to underscore the humour, not because I wanted to be critical.
FWIW, I also read it as an insult. And though I do believe you that that wasn’t your intent, I don’t see how else to read it even now.
Well, it wasn’t intended as a kind comment either, but it clearly fell a lot flatter than I thought it would.