I want a deeper understanding of the basic concepts. Like, mean is an indicator of the central tendency of a sample. Intuitively, it makes sense. But why this particular formula of sum/n? You can apply all kinds of mathematical stuff to the sample.
The mean of the sum of two random variables is the sum of the means (ditto with the variances); there’s no similarly simple formula for the median. (See ChristianKl’s comment for why you’d care about the sum.)
The mean if the value of x that minimizes SUM_i (x—x_i)^2; if you have to approximate all elements in your sample with the same value and the cost of an imperfect approximation is the square distance from the exact value (and any smooth function looks like the square when you’re sufficiently close to the minimum), then you should use the mean.
(Of course, all this means that if you’re more likely to multiply things together than add them, the badness of an approximation depends on the ratio between it and the true value rather than the difference, and things are distributed log-normally, you should use the geometric mean instead. Or just take the log of everything.)
The mean of the sum of two random variables is the sum of the means (ditto with the variances); there’s no similarly simple formula for the median. (See ChristianKl’s comment for why you’d care about the sum.)
The mean if the value of x that minimizes SUM_i (x—x_i)^2; if you have to approximate all elements in your sample with the same value and the cost of an imperfect approximation is the square distance from the exact value (and any smooth function looks like the square when you’re sufficiently close to the minimum), then you should use the mean.
The mean and variance are jointly sufficient statistics for the normal distribution
Possibly something else which doesn’t come to my mind at the moment.
(Of course, all this means that if you’re more likely to multiply things together than add them, the badness of an approximation depends on the ratio between it and the true value rather than the difference, and things are distributed log-normally, you should use the geometric mean instead. Or just take the log of everything.)