This was originally a comment on this post by mruwnik regarding averaging various distributions with different distributions. I made it a post to include pictures.
The Central Limit Theorem, henceforth CLT, states (in my own words) that regardless of the distribution of a population, sample averages from that population should be normally distributed.
In theory it should hold for log-normal distributions but that doesn’t feel intuitive to me so I tested it.
A silly example of CLT
An example I made up in my head to make sense of it:
Imagine a population comprised of all the people who nap 2 times in a day. Lets plot the ages of this population:
Mostly infants and elderly people nap, hence the shape of the graph. This data is NOT normal. But if you randomly pick a small sample (n=10) from this population and average it, it will have a mix of old people and infants that averages to middle-age. For example imagine the ages are 80,2,1,2,75,76,1,1,85,70 this will average to about 39. If you do this over and over again with randomly chosen samples you will get a normal distribution.
Does it work with log-normal populations?
I didn’t find it intuitive this would work for a log-normal population.
If I take data that is log-normal but split it into small samples, will the average of those small samples be normally distributed?
Chess matches:
I am arranging a chess tournament. I need to figure out how long the average match is so I can plan accordingly. I hear that chess matches seem to follow a log-normal distribution, but I’m not sure what that means statistically so I will try to just average the game times.
Data from the population
This is what my fake population (n=100,000) looks like. Its log-normal.
Tournaments
I observe tournaments (n=100 games) and take a simple average of the match length.
Here is a histogram plot of the tournaments
Lessons?
The sample size does matter here. A sample size too small (n=10) and you just end up with the original log-normal distribution. This is expected as the sample size moves from small to large you get a range of smoothing effects pushing the distribution to normal until you get a single point, when the sample = population.
Averaging samples from a population with log-normal distribution
This was originally a comment on this post by mruwnik regarding averaging various distributions with different distributions. I made it a post to include pictures.
The Central Limit Theorem, henceforth CLT, states (in my own words) that regardless of the distribution of a population, sample averages from that population should be normally distributed.
In theory it should hold for log-normal distributions but that doesn’t feel intuitive to me so I tested it.
A silly example of CLT
An example I made up in my head to make sense of it:
Imagine a population comprised of all the people who nap 2 times in a day. Lets plot the ages of this population:
Mostly infants and elderly people nap, hence the shape of the graph. This data is NOT normal. But if you randomly pick a small sample (n=10) from this population and average it, it will have a mix of old people and infants that averages to middle-age. For example imagine the ages are 80,2,1,2,75,76,1,1,85,70 this will average to about 39. If you do this over and over again with randomly chosen samples you will get a normal distribution.
Does it work with log-normal populations?
I didn’t find it intuitive this would work for a log-normal population.
If I take data that is log-normal but split it into small samples, will the average of those small samples be normally distributed?
Chess matches:
I am arranging a chess tournament. I need to figure out how long the average match is so I can plan accordingly. I hear that chess matches seem to follow a log-normal distribution, but I’m not sure what that means statistically so I will try to just average the game times.
Data from the population
This is what my fake population (n=100,000) looks like. Its log-normal.
Tournaments
I observe tournaments (n=100 games) and take a simple average of the match length.
Here is a histogram plot of the tournaments
Lessons?
The sample size does matter here. A sample size too small (n=10) and you just end up with the original log-normal distribution. This is expected as the sample size moves from small to large you get a range of smoothing effects pushing the distribution to normal until you get a single point, when the sample = population.