Yeah, thanks for this comment, I sorta skipped it because I didn’t want to write too much… or something. In retrospect I’m not sure I modelled curious readers well enough, I should’ve just left it in.
One thing I noticed that I’m not so sure about: A motivation you might have for |x−m|2 over |x−m|1 (i.e. mean over median) is that you want a summary statistic that always changes when the data points do. As you move from 1,2,3 to 1,2,4, the median doesn’t change but the mean does.
And yet, given that in |x−m|p with p rising it approaches the centre of the max and min, it’s curious to see that we’ve chosen p=2. We wanted a summary statistic that changed as the data did, but of all possible ones, changed the least with the data. We could’ve settled on any integer greater than 1, and we picked 2.
From a purely mathematical point of view I don’t see why the exponent should be an integer. But p=2 is preferred over all other real values because of the Central Limit Theorem.
Yeah, thanks for this comment, I sorta skipped it because I didn’t want to write too much… or something. In retrospect I’m not sure I modelled curious readers well enough, I should’ve just left it in.
One thing I noticed that I’m not so sure about: A motivation you might have for |x−m|2 over |x−m|1 (i.e. mean over median) is that you want a summary statistic that always changes when the data points do. As you move from 1,2,3 to 1,2,4, the median doesn’t change but the mean does.
And yet, given that in |x−m|p with p rising it approaches the centre of the max and min, it’s curious to see that we’ve chosen p=2. We wanted a summary statistic that changed as the data did, but of all possible ones, changed the least with the data. We could’ve settled on any integer greater than 1, and we picked 2.
From a purely mathematical point of view I don’t see why the exponent should be an integer. But p=2 is preferred over all other real values because of the Central Limit Theorem.