I’d argue that moving the data like that is not as precise: “choose any data point right of the median, and add any amount to it” is a larger set of operations than “subtract the mean from the distribution”.
(Although: is there a larger class of operations than just subtracting the mean that result in identical means but different medians? If there were, that would damage my conception of robustness here, but I haven’t tried to think of how to find such a class of operations, if they exist.)
(Although: is there a larger class of operations than just subtracting the mean that result in identical means but different medians? If there were, that would damage my conception of robustness here, but I haven’t tried to think of how to find such a class of operations, if they exist.)
Subtracting the mean and then scaling the resulting distribution by any nonzero constant works.
Alternatively, if you have a distribution and want to turn it into a different distribution with the same mean but different median:
You can move two data points, one by X, the other by -X, so long as this results in a non-zero net number of crossings over the former median.
This is guaranteed to be the case with either any sufficiently large positive X or any sufficiently large negative X.
This is admittedly a one-dimensional subset of a 2-dimensional random space.
You can move three data points, one by X, one by Y, the last by -(X+Y), so long as this results in a non-zero net number of crossings over the former median.
This is guaranteed to be the case for large enough (absolute) values. (Unlike in the even-N case this always works for large X and/or Y, regardless of sign.)
This is admittedly a 2-dimensional subset of a 3-dimensional random space.
etc.
I’d argue that moving the data like that is not as precise: “choose any data point right of the median, and add any amount to it” is a larger set of operations than “subtract the mean from the distribution”.
Aha. Now you are getting closer to the typical notion of robustness!
Imagine taking a sample of N elements (...N odd, to keep things simple) from a distribution and applying a random infinitesimal perturbation. Say chosen uniformly from {−ϵ,0,ϵ} for every element in said distribution.
In order for the median to stay the same the median element must not change[1]. So we have a probability of 1/3rd that this doesn’t change the median. This scales as O(1).
In order for the mean to stay the same the resulting perturbation must have mean of 0 (and hence, must have a sum of zero). How likely is this? Well, this is just a lazy random walk. The resulting probability (in the large-N limit) is just[2]
I like that notion of robustness! I’m having trouble understanding the big-O behavior here because of the 1/N^2 term—does the decreasing nature of this function as N goes up mean the mean becomes more robust than the median for large N, or does the median always win for any N?
There’s a 1/3rd chance that the median does not change under an infinitesimal perturbation as I’ve defined it. There’s a Θ(N−1/2) chance that the mean does not change under an infinitesimal perturbation as I’ve defined it.
Or, to flip it around:
There’s a 2/3rds chance that the median does change under an infinitesimal perturbation as I’ve defined it. There’s a 1−Θ(N−1/2) chance that the mean does change under an infinitesimal perturbation as I’ve defined it.
As you increase the number of data points, the mean asymptotes towards ‘almost always’ changing under an infinitesimal perturbation, whereas the median stays at a 2/3rds[1] chance.
Minor self-nit: this was assuming an odd number of data points. That being said, the probability assuming an even number of data points (and hence mean-of-center-two-elements) actually works out to the same - {−ϵ,−ϵ},{−ϵ,0},{0,−ϵ},{0,ϵ},{ϵ,0},{ϵ,ϵ} all change the median, or 6⁄9 possibilities.
I’d argue that moving the data like that is not as precise: “choose any data point right of the median, and add any amount to it” is a larger set of operations than “subtract the mean from the distribution”.
(Although: is there a larger class of operations than just subtracting the mean that result in identical means but different medians? If there were, that would damage my conception of robustness here, but I haven’t tried to think of how to find such a class of operations, if they exist.)
Subtracting the mean and then scaling the resulting distribution by any nonzero constant works.
Alternatively, if you have a distribution and want to turn it into a different distribution with the same mean but different median:
You can move two data points, one by X, the other by -X, so long as this results in a non-zero net number of crossings over the former median.
This is guaranteed to be the case with either any sufficiently large positive X or any sufficiently large negative X.
This is admittedly a one-dimensional subset of a 2-dimensional random space.
You can move three data points, one by X, one by Y, the last by -(X+Y), so long as this results in a non-zero net number of crossings over the former median.
This is guaranteed to be the case for large enough (absolute) values. (Unlike in the even-N case this always works for large X and/or Y, regardless of sign.)
This is admittedly a 2-dimensional subset of a 3-dimensional random space.
etc.
Aha. Now you are getting closer to the typical notion of robustness!
Imagine taking a sample of N elements (...N odd, to keep things simple) from a distribution and applying a random infinitesimal perturbation. Say chosen uniformly from {−ϵ,0,ϵ} for every element in said distribution.
In order for the median to stay the same the median element must not change[1]. So we have a probability of 1/3rd that this doesn’t change the median. This scales as O(1).
In order for the mean to stay the same the resulting perturbation must have mean of 0 (and hence, must have a sum of zero). How likely is this? Well, this is just a lazy random walk. The resulting probability (in the large-N limit) is just[2]
P[XN=0]≈√32πN
[3]
This scales as O(N−1/2).
Because this is an infinitesimal perturbation the probability that this changes which element is the median is ~zero.
https://math.stackexchange.com/a/1327363/246278 with n=3 and l=N
A wild π appeared![4]
I don’t know why I am so amused by π turning up in ‘random’ places.
I like that notion of robustness! I’m having trouble understanding the big-O behavior here because of the 1/N^2 term—does the decreasing nature of this function as N goes up mean the mean becomes more robust than the median for large N, or does the median always win for any N?
Ah, to be clear:
There’s a 1/3rd chance that the median does not change under an infinitesimal perturbation as I’ve defined it.
There’s a Θ(N−1/2) chance that the mean does not change under an infinitesimal perturbation as I’ve defined it.
Or, to flip it around:
There’s a 2/3rds chance that the median does change under an infinitesimal perturbation as I’ve defined it.
There’s a 1−Θ(N−1/2) chance that the mean does change under an infinitesimal perturbation as I’ve defined it.
As you increase the number of data points, the mean asymptotes towards ‘almost always’ changing under an infinitesimal perturbation, whereas the median stays at a 2/3rds[1] chance.
Minor self-nit: this was assuming an odd number of data points. That being said, the probability assuming an even number of data points (and hence mean-of-center-two-elements) actually works out to the same - {−ϵ,−ϵ},{−ϵ,0},{0,−ϵ},{0,ϵ},{ϵ,0},{ϵ,ϵ} all change the median, or 6⁄9 possibilities.
Gotcha—thanks.