The reason Lp distances matter here is: the different location parameters minimize different Lp distances. The mean is the single number μ that minimizes the L2 distance L2(x,μ)=√∑ni(xi−μ)2. The median is the number ~m that minimizes L1(x,~m), and the mode is the ^m that minimizes L∞(x,^m).
I believe the mode minimizes the L0 distance rather than the L∞ distance? Since L0 measures whether two values are different, and the mode maximizes the probability that a random point is the same as the mode?
I think the number minimizing L∞ would be the mean of the minimum and the maximum value of the distribution, which we might call the “middle of the distribution”. (Unfortunately “middle” also sounds like it could describe the median, but I think it better describes 12(min+max) than it describes the median.) This probably also gives a hint as to what minimizing L4 would look like; it would tend to be somewhere inbetween the mean and the middle, like a location statistic that’s extra sensitive to outliers.
(I’m skipping L3 for reasons. [Edit, a year later: I’ve forgotten these reasons])
Observation: for a variable that is positive and often close to zero (e.g. approximately lognormal), the L infty minimizer would be half of the maximum, which is isomorphic to literally just using the maximum. This seems like the quantitative version of focusing terrorism discussions on 9/11 and focusing wealth inequaility discussions on Elon Musk and so on.
I believe the mode minimizes the L0 distance rather than the L∞ distance? Since L0 measures whether two values are different, and the mode maximizes the probability that a random point is the same as the mode?
I think the number minimizing L∞ would be the mean of the minimum and the maximum value of the distribution, which we might call the “middle of the distribution”. (Unfortunately “middle” also sounds like it could describe the median, but I think it better describes 12(min+max) than it describes the median.) This probably also gives a hint as to what minimizing L4 would look like; it would tend to be somewhere inbetween the mean and the middle, like a location statistic that’s extra sensitive to outliers.
Probably to avoid absolute values?
Observation: for a variable that is positive and often close to zero (e.g. approximately lognormal), the L infty minimizer would be half of the maximum, which is isomorphic to literally just using the maximum. This seems like the quantitative version of focusing terrorism discussions on 9/11 and focusing wealth inequaility discussions on Elon Musk and so on.