gjm comments on Hold On To The Curiosity

gjm 23 Apr 2018 19:48 UTC
24 points
It’s possibly worth fitting this into a broader framework. The median minimizes the sum of $| x - m |$ . (So it’s the max-likelihood estimator if your tails are like $e x p (- | t |)$ .) The mean minimizes the sum of $| x - m |^{2}$ . (So it’s the max-likelihood estimator if your tails are like $e x p (- t^{2})$ , a normal distribution.)
What about other exponents? 1 and 2 are kinda standard cases; what about 0 or infinity? Or negative numbers? Let’s consider 0 first of all. $| x - m |^{0}$ is zero if $m = x$ and 1 otherwise. Minimizing the sum of these means maximizing the number of things equal to m. This is the mode! (We’ll continue to get the mode if we use negative exponents. In that case we’d better maximize the sum instead of minimizing it, of course.) As p increases without limit, minimizing the sum of $| x - m |^{p}$ gets closer and closer to minimizing $m a x (| x - m |)$ . If the 0-mean is the mode, the 1-mean is the median and the 2-mean is the ordinary mean, then the infinity-mean is midway between the max and min of your data. This one doesn’t get quite so much attention in stats class :-).
The median is famously more robust than the mean: it’s affected less by outliers. (This goes along with the fatter tails it assumes: if you assume a very thin-tailed distribution, then an outlying point is super-unlikely and you’re going to have to try very hard to make it less outlying.) The mode is more robust still, in that sense. The “infinity-mean” (note: these are my names, and so far as I know no one else uses them) is kinda the least robust average you can imagine, being affected *only* by the most outlying data points.
- Unnamed 23 Apr 2018 20:10 UTC
  5 points
  Parent
  The standard name for the “infinity-mean” is the midrange.
- Ben Pace 24 Apr 2018 6:41 UTC
  2 points
  Parent
  Yeah, thanks for this comment, I sorta skipped it because I didn’t want to write too much… or something. In retrospect I’m not sure I modelled curious readers well enough, I should’ve just left it in.
  One thing I noticed that I’m not so sure about: A motivation you might have for $| x - m |^{2}$ over $| x - m |^{1}$ (i.e. mean over median) is that you want a summary statistic that always changes when the data points do. As you move from $1, 2, 3$ to $1, 2, 4,$ the median doesn’t change but the mean does.
  And yet, given that in $| x - m |^{p}$ with $p$ rising it approaches the centre of the max and min, it’s curious to see that we’ve chosen $p = 2$ . We wanted a summary statistic that changed as the data did, but of all possible ones, changed the least with the data. We could’ve settled on any integer greater than 1, and we picked 2.
  - TheMajor 24 Apr 2018 9:01 UTC
    2 points
    Parent
    From a purely mathematical point of view I don’t see why the exponent should be an integer. But p=2 is preferred over all other real values because of the Central Limit Theorem.
- shardo 28 Apr 2018 19:24 UTC
  1 point
  Parent
  A longer explanation with pictures can be found here—Mean, median, mode, a unifying perspective
- Dacyn 24 Apr 2018 15:01 UTC
  1 point
  Parent
  I don’t think maximizing the sum of the negative exponents gets you the mode. If you use $0^{- p} = 0$ then the supremum (infinity) is not attained, while if you use $0^{- p} = \infty$ then the maximum (infinity) is attained at any data point. If you do it with a continuous distribution you get more sensible answers but the solution (which is intuitively the “point of greatest concentration”) is not necessarily unique.
  It’s worth mentioning that when $p > 1$ the $p$ -mean is unique: this is because $x \mapsto | x - m |^{p}$ is a convex function, the sum of convex functions is convex, and convex functions have unique minima.
  - gjm 24 Apr 2018 16:42 UTC
    2 points
    Parent
    I’m using $0^{- p} = \infty$ and using the cheaty convention that e.g. $3 \cdot \infty > 2 \cdot \infty$ . I think this is what you get if you regard a discrete distribution as a limit of continuous ones. If this is too cheaty, of course it’s fine just to stick with non-negative values of $p$ .
    - Dacyn 24 Apr 2018 20:15 UTC
      1 point
      Parent
      Yeah, OK. It works but you need to make sure to take the limit in a particular way, e.g. convolution with a sequence of approximations to the identity. Also you need to assume that $p > - 1$ since otherwise the statistic diverges even for the continuous distributions.