Ricardo Vieira comments on Why the tails come apart

Ricardo Vieira 26 Sep 2018 14:06 UTC
10 points
I ran some simulations in Python, and (if I did this correctly), it seems that if r > 0.95, you should expect the most extreme data-point of one variable to be the same in the other variable over 50% of the time (even more if sample size ⇐ 100)
http://nbviewer.jupyter.org/github/ricardoV94/stats/blob/master/correlation_simulations.ipynb
- gwern 11 Dec 2019 16:09 UTC
  4 points
  Parent
  You can simulate it out easily, yeah, but the exact answer seems more elusive. I asked on CrossValidated whether anyone knew the formula for ‘probability of the maximum on both variables given a r and n’, since it seems like something that order statistics researchers would’ve solved long ago because it’s interesting and relevant to contests/competitions/searches/screening, but no one’s given an answer yet.
  - gwern 25 Dec 2019 21:49 UTC
    6 points
    Parent
    I have found something interesting in the ‘asymptotic independence’ order statistics literature: apparently it’s been proven since 1960 that the extremes of two correlated distributions are asymptotically independent (obviously when r != 1 or −1). So as you increase n, the probability of double-maxima decreases to the lower bound of 1/n.
    
    The intuition here seems to be that n increases faster than increased deviation for any r, which functions as a constant-factor boost; so if you make n arbitrarily large, you can arbitrarily erode away the constant-factor boost of any r, and thus decrease the max-probability.
    
    I suspected as much from my Monte Carlo simulations (Figure 2), but nice to have it proven for the maxima and minima. (I didn’t understand the more general papers, so I’m not sure what other order statistics are asymptotically independent: it seems like it should be all of them? But some papers need to deal with multiple classes of order statistics, so I dunno—are there order statistics, like maybe the median, where the probability of being the same order in both samples doesn’t converge on 1/n?)
  - gjm 11 Dec 2019 21:09 UTC
    4 points
    Parent
    I can do n=1 (the probability is 1, obviously) and n=2 (the probability is $\frac{1}{2} + \frac{1}{π} {sin}^{- 1} r$ , not so obviously). n=3 and up seem harder, and my pattern-spotting skills are not sufficient to intuit the general case from those two :-).
    - gwern 11 Dec 2019 21:37 UTC
      4 points
      Parent
      Heh. I’ve sometimes thought it’d be nice to have a copy of Eureqa or the other symbolic tools, to feed the Monte Carlo results into and see if I could deduce any exact formula given their hints. I don’t need exact formulas often but it’s nice to have them. I’ve noticed people can do apparently magical things with Mathematica in this vein. All proprietary AFAIK, though.
      - gwern 12 Mar 2020 15:49 UTC
        2 points
        Parent
        Writeup: https://www.gwern.net/Order-statistics#probability-of-bivariate-maximum