Cyan comments on A Request for Open Problems

Cyan 8 May 2009 18:31 UTC
2 points
This one’s well-known in certain quarters (so it’s not really open), but it may provide training for those unfamiliar with it.

Suppose that you observe two random samples from the uniform distribution centered at unknown location c with width 1. Label the samples x_max and x_min. The random interval (x_min, x_max) is a 50% confidence interval for c because it contains c* with 50% probability.
- What is the practical problem with this confidence interval?
- Do better.
* changed from “deviates” per comment below
What links here?
- Cyan's comment on Case study: abuse of frequentist statistics by Cyan (22 Feb 2010 1:44 UTC; 9 points)
- RolfAndreassen 8 May 2009 19:43 UTC
  1 point
  Parent
  “The uniform distribution centered at c” does not seem to make sense. Did you perchance mean the Gaussian distribution? Further, ‘deviates’ looks like jargon to me. Can we use ‘samples’? I would therefore rephrase as follows, with specific example to hang one’s visualisation on:
  
  Heights of male humans are known to have a Gaussian distribution of width 10 cm around some central value ; unfortunately you have forgotten what the central value is. Joe is 180 cm, Stephen is 170 cm. The probability that is between these two heights is 50%; explain why. Then find a better confidence interval for .
  - Cyan 8 May 2009 20:14 UTC
    2 points
    Parent
    I mean the continuous uniform distribution). “Centered at c” is intended to indicate that the mean of the distribution is c.
    
    ETA: Let me be specific—I’ll use the notation of the linked Wikipedia article.
    
    You know that b—a = 1.
    
    c = (a + b)/2 is unknown, and the confidence interval is supposed to help you infer it.
  - MrHen 8 May 2009 20:26 UTC
    0 points
    Parent
    If exactly half of all men have a height less than the central value c, than randomly picking sample will have a 50% chance of being below c. Picking two samples (A and B) results in four possible scenarios:
    
    A is less than c; B is greater than c
    A is less than c; B is less than c
    A is greater than c; B is greater than c
    A is greater than c; B is less than c
    
    The interval created by (A, B) contains c in scenarios (1) and (4) and does not contain c in scenarios (2) and (3). Since each scenario has an equal chance of occurring, c is in (A, B) 50% of the time.
    
    That is as far as I got just thinking about it. If I am on the right path I can keep plugging away.
    - Cyan 8 May 2009 20:34 UTC
      2 points
      Parent
      In the Gaussian case, you can do better than (A, B) but the demonstration of that fact won’t smack you in the face they way it does in the case of the uniform distribution.
      - steven0461 8 May 2009 20:43 UTC
        0 points
        Parent
        One thing you can do in the uniform case is shorten the interval to at most length ¹⁄₂. Not sure if that’s face-smacking enough.
        Psy-Kosh 8 May 2009 22:39 UTC
        0 points
        Parent
        You can do better than that. If the distance between the two data points is ⁷⁄₄, you can shrink the 100% confidence interval to ¹⁄₄, etc. (The extreme case is as the distance between the two data points approaches 2, your 100% confidence interval approaches size zero.)
        
        EDIT: whoops, I was stupid. Corrected ³⁄₄ to ⁷⁄₄ and 1 to 2. There, now it should be right
      - AllanCrossman 8 May 2009 20:37 UTC
        0 points
        Parent
        Do we know the heights of the men A and B? If so, we can get a better estimate of whether c lies between their heights by taking into account the difference between A and B...
        Cyan 8 May 2009 20:40 UTC
        0 points
        Parent
        That’s the basic idea. Now apply it in the case of the uniform distribution.
        AllanCrossman 8 May 2009 20:41 UTC
        1 point
        Parent
        If all men are (say) within 10 cm of each other, and the heights are uniformly distributed...
        
        … if we have two men who are 8 cm apart, then c lies between their heights with 80% probability?
        Cyan 8 May 2009 20:43 UTC
        0 points
        Parent
        Getting there… 80% is too low.
        AllanCrossman 8 May 2009 20:47 UTC
        2 points
        Parent
        Wait, what? It must be 100%...
        Cyan 8 May 2009 20:50 UTC
        2 points
        Parent
        That’s it. The so-called 50% confidence interval sometimes contains c with certainty. Also, when x_max—x_min is much smaller than 0.5, 50% is a lousy summary of the confidence (ETA: common usage confidence, not frequentist confidence) that c lies between them.
        AllanCrossman 8 May 2009 20:54 UTC
        0 points
        Parent
        If it’s less than 0.5, is the confidence simply that value times 2?
        Expand this thread
        steven0461 8 May 2009 21:00 UTC
        1 point
        Parent
        “Confidence” in the statistics sense doesn’t always have much to do with how confident you are in the conclusion. Something that’s the real line in half of all cases and the empty set in the other half of all cases is a 50% confidence interval, but that doesn’t mean you’re ever 50% confident (in the colloquial sense) that the parameter is on the real line or that the parameter is in the empty set.
        Vladimir_Nesov 8 May 2009 21:07 UTC
        2 points
        Parent
        The Credible interval article on Wikipedia describes the distinction between frequentist and Bayesian confidence intervals.
        steven0461 8 May 2009 21:16 UTC
        2 points
        Parent
        The general pattern here is that there’s something you do care about and something you don’t care about, and frequentism doesn’t allow you to talk about the thing you do care about, so it renames the thing you don’t care about in such a way as to suggest that it’s the thing you do care about, and everyone who doesn’t understand statistics well interprets it as such.
        Cyan 8 May 2009 21:10 UTC
        0 points
        Parent
        The interesting thing about the confidence interval I’m writing about is that it has some frequentist optimality properties. (“Uniformly most accurate”, if anyone cares.)
        AllanCrossman 8 May 2009 21:19 UTC
        0 points
        Parent
        Well. So if all men were within 10 cm of each other, and uniformly distributed, and we plucked 2 random men out, and they were 4cm apart, would c be between them with 80% probability? Or some other value?
        steven0461 8 May 2009 21:27 UTC
        0 points
        Parent
        The shorter man can be between c-5 and c+1 with all values equally probable, if he’s between c-5 and c-4 or c and c+1 then c is not between them, if he’s between c-4 and c then c is between them, so assuming a uniform prior for c the probability is ²⁄₃ if I’m not mistaken.
        AllanCrossman 8 May 2009 21:34 UTC
        0 points
        Parent
        Ah, I see what I did wrong. I think.
        Cyan 8 May 2009 21:32 UTC
        0 points
        Parent
        Yup. Under the uniform prior the posterior probability that c is between the two values is d/(1 - d), 0 < d < 0.5, where d = x_max—x_min (and the width of the uniform data distribution is 1).
        Cyan 8 May 2009 21:26 UTC
        0 points
        Parent
        The answer to that depends on what you know about c beforehand—your prior probability for c.
        steven0461 8 May 2009 21:23 UTC
        0 points
        Parent
        It’s not between them if the shorter man is 4-5 cm shorter than average or 0-1 cm taller than average, so yes, 80% assuming a uniform prior for c.
        Cyan 8 May 2009 21:01 UTC
        0 points
        Parent
        Whoops—“confidence” is frequentist jargon. I’ll just say that any better method ought to take x_max—x_min into account.