[deleted] comments on Open Thread: March 2010

[deleted]Mar 6, 2010, 3:24 PM
7 points
Pick some reasonable priors and use them to answer the following question.

On week 1, Grandma calls on Thursday to say she is coming over, and then comes over on Friday. On week 2, Grandma once again calls on Thursday to say she is coming over, and then comes over on Friday. On week 3, Grandma does not call on Thursday to say she is coming over. What is the probability that she will come over on Friday?

ETA: This is a problem, not a puzzle. Disclose your reasoning, and your chosen priors, and don’t use ROT13.
What links here?
- Douglas_Knight's comment on Open Thread: March 2010 by AdeleneDawner (Sep 22, 2010, 1:56 AM; 0 points)
- Sniffnoy Mar 7, 2010, 4:50 AM
  4 points
  Parent
  In the calls, does she specify when she is coming over? I.e. does she say she’ll be coming over on Thursday, Friday, just sometime in the near future, or she leaves it for you to infer?
  - [deleted]Mar 7, 2010, 8:33 PM
    1 point
    Parent
    The information I gave is the information you have. Don’t make me make the problem more complicated.
    
    ETA: Let me expand on this before people start getting on my case.
    
    Rationality is about coming to the best conclusion you can given the information you have. If the information available to you is limited, you just have to deal with it.
    
    Besides, sometimes, having less information makes the problem easier. Suppose I give you the following physics problem:
    
    I throw a ball from a height of 4 feet; its maximum height is 10 feet. How long does it take from the time I throw it for it to hit the ground?
    
    This problem is pretty easy. Now, suppose I also tell you that the ball is a sphere, and I tell you its mass and radius, and the viscosity of the air. This means that I’m expecting you to take air resistance into account, and suddenly the problem becomes a lot harder.
    
    If you really want a problem where you have all the information, here:
    
    Every time period, input A (of type Boolean) is revealed, and then input B (also of type Boolean) is revealed. There are no other inputs. In time period 0, input A is revealed to be TRUE, and then input B is revealed to be TRUE. In time period 1, input A is revealed to be TRUE, and then input B is revealed to be TRUE. In time period 2, input A is revealed to be FALSE. What is the probability that input B will be revealed to be TRUE?
    What links here?
    [deleted]'s comment on Open Thread: March 2010 by AdeleneDawner (Mar 7, 2010, 8:34 PM; 0 points)
    - Douglas_Knight Mar 7, 2010, 11:51 PM
      7 points
      Parent
      Having less information makes easier the problem of satisfying the teacher. It does not make easier the problem of determining when the ball hits the ground. Incidentally, I got the impression somehow that there are venues where physics teachers scold students for using too much information.
      
      ETA (months later): I do think it’s a good exercise, I just think this is not why.
      - [deleted]Mar 8, 2010, 12:54 AM
        0 points
        Parent
        Here, though, the problem actually is simpler the less information you have. As an extreme example, if you know nothing, the probability is always ¹⁄₂ (or whatever your prior is).
    - RobinZ Mar 7, 2010, 9:29 PM
      −1 points
      Parent
      I can say immediately that it is less than 50% - to be more rigorous would take a minute.
      
      Edit: Wait—no, I can’t. If the variables are related, then that conclusion would appear, but it’s not necessary that they be.
- orthonormal Mar 8, 2010, 10:11 PM
  3 points
  Parent
  Let
  - AN = “Grandma calls on Thursday of week N”,
  - BN = “Grandma comes on Friday of week N”.
  A toy version of my prior could be reasonably close to the following:
  
  P(AN)=p, P(AN,BN)=pq, P(~AN,BN)=(1-p)r
  
  where
  - the distribution of p is uniform on [0,1]
  - the distribution of q is concentrated near 1 (distribution proportional to f(x)=x on [0,1], let’s say)
  - the distribution of r is concentrated near 0 (distribution proportional to f(x)=1-x on [0,1], let’s say)
  Thus, the joint probability distribution of (p,q,r) is given by 4q(1-r) once we normalize. Now, how does the evidence affect this? The likelihood ratio for (A1,B1,A2,B2) is proportional to (pq)^2, so after multiplying and renormalizing, we get a joint probability distribution of 24p^2q^3(1-r). Thus P(~A3|A1,B1,A2,B2)=1/4 and P(~A3,B3|A1,B1,A2,B2)=1/12, so I wind up with a 1 in 3 chance that Grandma will come on Friday, if I’ve done all my math correctly.
  
  Of course, this is all just a toy model, as I shouldn’t assume things like “different weeks are independent”, but to first order, this looks like the right behavior.
  - orthonormal Mar 9, 2010, 8:42 AM
    2 points
    Parent
    I should have realized this sooner: P(B3|~A3) is just the updated value of r, which isn’t affected at all by (A1,B1,A2,B2). So of course the answer according to this model should be ¹⁄₃, as it’s the expected value of r in the prior distribution.
    
    Still, it was a good exercise to actually work out a Bayesian update on a continuous prior. I suggest everyone try it for themselves at least once!
- RobinZ Mar 6, 2010, 4:03 PM
  3 points
  Parent
  I fail to see how this question has a perceptibly rational answer—too much depends on the prior.
  - [deleted]Mar 6, 2010, 10:29 PM
    4 points
    Parent
    Presumably, once you’ve picked your priors, the rest follows. And presumably, once you’ve come up with an answer, you’ll disclose your reasoning, and your chosen priors.
- ata Mar 7, 2010, 2:51 AM
  2 points
  Parent
  Does she come over unannounced on any days other than Friday?
  - [deleted]Mar 7, 2010, 8:34 PM
    0 points
    Parent
    I don’t know.
- Richard_Kennaway Mar 8, 2010, 10:42 AM
  1 point
  Parent
  Using the information that she is my grandmother, I speculate on the reason why she did not call on Thursday. Perhaps it is because she does not intend to come on Friday: P(Friday) is lowered. Perhaps it is because she does intend to come but judges the regularity of the event to make calling in advance unnecessary unless she had decided not to come: P(Friday) is raised. Grandmothers tend to be old and consequently may be forgetful: perhaps she intends to come but has forgotten to call: P(Friday) is raised. Grandmothers tend to be old, and consequently may be frail: perhaps she has been taken unwell; perhaps she is even now lying on the floor of her home, having taken a fall, and no-one is there to help: P(Friday) is lowered, and perhaps I should phone her.
  
  My answer to the problem is therefore: I phone her to see how she is and ask if she is coming tomorrow.
  
  I know—this is not an answer within the terms of the question. However, it is my answer.
  
  The more abstract version you later posted is a different problem. We have two observations of A and B occurring together, and that is all. Unlike the case of Grandma’s visits, we have no information about any causal connection between A and B. (The sequence of revealing A before B does not affect anything.) What is then the best estimate of P(B|~A)?
  
  We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent. Therefore A can be ignored and the Laplace rule of succession applied to the two observations of B, giving ³⁄₄.
  
  ETA: I originally had a far more verbose analysis of the second problem based on modelling it as an urn problem, which I then deleted. But the urn problem may be useful for the intuition anyway. You have an urn full of balls, each of which is either rough or smooth (A or ~A), and either black or white (B or ~B). You pick two balls which turn out to be both rough and black. You pick a third and feel that it is smooth before you look at it. How likely is it to be black?
  - wnoise Mar 8, 2010, 9:54 PM
    3 points
    Parent
    Directly using the Laplace rule of succession on the sample space A \tensor B gives weights proportional to:
    
    (A,B): 3 (A, ~B): 1 (~A, B): 1 (~A, ~B): 1
    Conditioning on ~A, P(B|~A) = ¹⁄₂. Assuming independence does make a significant difference on this little data.
  - orthonormal Mar 8, 2010, 9:29 PM
    3 points
    Parent
    
    We have no information about the relation between A and B, so I am guessing that a reasonable prior for that relation is that A and B are independent.
    
    On the contrary, on two points.
    
    First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way— or, to put it another way, if that were your prior and you observed 100 cases and A and B agreed each time (sometimes true, sometimes false), you’d still assume they were independent.
    
    What you should have said, I think, is that a reasonable prior would have “A and B independent” as one of the most probable options for their relation, as it is one of the simplest. But it should also give some substantial weight to simple dependencies like “A and B identical” and “A and B opposite”.
    
    Second, the sense in which we have no prior information about relations between A and B is not a sense that justifies ignoring A. We had no prior information before we observed them agreeing twice, which raises the probability of “A and B identical” while somewhat lowering that of “A and B independent”.
    - wnoise Mar 8, 2010, 9:48 PM
      0 points
      Parent
      It’s true that the prior should not be “A and B are independent”. But shouldn’t symmetries of how they may be dependent give essentially the same result as assuming independence? Similar as to how any symmetric prior for how a coin is biased gives the same results for a prediction of probability of heads -- ¹⁄₂.
      
      I don’t think independence is a good way to analyze things when the probabilities are near zero or one. Independence is just P[A] P[B] = P[AB]. If P[A] or P[B] are near zero or one, this is automatically “nearly true”.
      
      Put another way, two observation of (A, B) give essentially no information about dependence by themselves. This is encoded into ratios between the four possibilities.
    - Richard_Kennaway Mar 8, 2010, 10:33 PM
      −2 points
      Parent
      
      First, “A and B are independent” is not a reasonable prior, because it assigns probability 0 to them being dependent in some way
      
      This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
      
      On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence. Almost everyone has two eyes and two legs, and therefore almost everyone has both two eyes and two legs, but it does not follow from those observations alone that possession of two eyes either is, or is not, independent of having two legs. For example, it is well-known (in some possible world) that the rare grey-green greasy Limpopo bore worm invariably attacks either the eyes, or the legs, but never both in the same patient, and thus observing someone walking on healthy legs conveys a tiny positive amount of probability that they have no eyes; while (in another possible world) the venom of the giant rattlesnake of Sumatra rapidly causes both the eyes and the legs of anyone it bites to fall off, with the opposite effect on the relationship between the two misfortunes. I can predict that someone has both two eyes and two legs from the fact that they are a human being. The extra information about their legs that I gain from examining their eyes could go either way.
      
      But that is just an intuitive ramble. What is needed here is a calculation, akin to the Laplace rule of succession, for observations in a 2x2 contingency table. Starting from an ignorance prior that the probabilities of A&B, A&~B, B&~A, and ~A&~B are each ¹⁄₄, and observing a, b, c, and d examples of each, what is the appropriate posterior? Then fill in the values 2, 0, 0, and 0.
      
      ETA: On reading the comments, I realise that the above is almost all wrong.
      - jimrandomh Mar 9, 2010, 1:43 AM
        7 points
        Parent
        
        This raises a question of the meaningfuless of second-order Bayesian reasoning. Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value? A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
        
        In order to have a probability distribution rather than just a probability, you need to ask a question that isn’t boolean, ie one with more than two possible answers. If you ask “Will this coin come up heads on the next flip?”, you get a probability, because there are only two possible answers. If you ask “How many times will this coin come up heads out of the next hundred flips?”, then you get back a probability for each number from 0 to 100 - that is, a probability distribution. And if you ask “what kind of coin do I have in my pocket?”, then you get a function that takes any possible description (from “copper” to “slightly worn 1980 American quarter”) and returns a probability of matching that description.
      - orthonormal Mar 8, 2010, 11:02 PM
        4 points
        Parent
        
        Suppose I had a prior for the probability of some event C of, say, 0.469. Could one object to that, on the grounds that I have assigned a probability of zero to the probability of C being some other value?
        
        Depends on how you’re doing this; if you have a continuous prior for the probability of C, with an expected value of 0.469, then no— and future evidence will continue to modify your probability distribution. If your prior for the probability of C consists of a delta mass at 0.469, then yes, your model perhaps should be criticized, as one might criticize Rosenkrantz for continuing to assume his coin is fair after 30 consecutive heads.
        
        A Bayesian reasoner actually would have a hierarchy of uncertainty about every aspect of ver model, but the simplicity weighting would give them all low probabilities unless they started correctly predicting some strong pattern.
        
        A prior of independence of A and B seems to me of a like nature to an assignment of a probability to C.
        
        Independence has a specific meaning in probability theory, and it’s a very delicate state of affairs. Many statisticians (and others) get themselves in trouble by assuming independence (because it’s easier to calculate) for variables that are actually correlated.
        
        And depending on your reference class (things with human DNA? animals? macroscopic objects?), having 2 eyes is extremely well correlated with having 2 legs.
      - FAWS Mar 8, 2010, 10:43 PM
        4 points
        Parent
        
        On the second point, seeing A and B together twice, or twenty times, tells me nothing about their independence.
        
        Even without any math It already tells you that they are not mutually exclusive. See wnoise’s reply to the grandparent post for the Laplace rule equivalent.
  - [deleted]Mar 8, 2010, 8:12 PM
    3 points
    Parent
    I really like your urn formulation.
- Peter_de_Blanc Mar 7, 2010, 9:57 PM
  1 point
  Parent
  OK, I’ll use the same model I use for text. The zeroth-order model is maxentropy, and the kth-order model is a k-gram model with a pseudocount of 2 (the alphabet size) allocated to the (k-1)th-order model.
  
  In this case, since there’s never before been a Thursday in which she did not call, we default to the 1st-order model, which says the probability is ³⁄₄ that she will come on Friday.
  - [deleted]Mar 8, 2010, 1:13 AM
    3 points
    Parent
    I beg your pardon?
  - Douglas_Knight Sep 22, 2010, 1:56 AM
    0 points
    Parent
    
    OK, I’ll use the same model I use for text. The zeroth-order model is maxentropy, and the kth-order model is a k-gram model with a pseudocount of 2 (the alphabet size) allocated to the (k-1)th-order model.
    
    Is this a standard model? Does it have a name? a reference?
    I see that the level 1 model is Laplace’s rule of succession. Is there some clean statement about the level k model? Is this a bayesian update?
    
    In this case, since there’s never before been a Thursday in which she did not call, we default to the 1st-order model, which says the probability is ³⁄₄ that she will come on Friday.
    
    You seem to be treating the string as being labeled by alternating Thursdays and Fridays, which have letters drawn from different alphabets. The model easily extends to this, but it was probably worth saying, particularly since the two alphabets happen to have the same size.
    
    I find it odd that almost everyone treated weeks as discrete events. In this problem, days seem like the much more natural unit to me. ata probably agrees with me, but he didn’t reach a conclusion. With weeks, we have very few observations, so a lot depends on our model, like whether we use alphabets of size 2 for Thursday and Friday (Peter), or whether we use alphabets of size 4 for the whole week (wnoise). I’m going to allow calls and visits on each day and use an alphabet of size 4 for each day. I think it would be better to use a Peter-ish system of separating morning visits from evening calls, but with data indexed by days, we have a lot of data, so I don’t think this matters so much.
    
    I’ll run my weeks Sun-Sat. Weeks 1 and 2 are complete and week 3 is partial. Treating days as independent and having 4 outcomes: ([no]visit)x([no]call). I interpret the unspecified days as having no call and no visit. Using Laplace’s rule of succession, we have ⁴⁄₂₃ chance of visit, which sounds pretty reasonable to me. But if we use Peter’s hierarchical model, I think our chance of a visit is 4/23*4/17*4/14*4/11*4/8*4/5 = ¹⁄₅₀₀. That is, since we’ve never seen a visit after a no-call/no-visit day, the only way to get a visit is from level 1 of the model, so we multiply the chance of falling through from level 2 to level 1, from level 3 to 2, etc. The chance of falling through from level n+1 to level n is 4/(4+c), where c is the number of times we’ve seen an n+1-gram that continues the last n days. So for n=5, the last 5 days were no-visit-no-call, which we’ve seen once before, culminating in the no-visit-call Thursday of the second week. So that’s our factor of ⁴⁄₅. For n=4, we’ve seen the resolution of 4 consecutive days of no-visit-no-call, once in the first week, twice in the second week, and once in the third week; so that’s the ⁴⁄₈.
    
    ¹⁄₅₀₀ seems awfully small to me. Am I using this model correctly? I like level 2, 4/23*4/17=4%, but maybe I’m implicitly getting “2” from a prior that the call is connected to the visit.
    
    With a Peter’s two alphabets, each of size two, level 1 yields ³⁄₂₁, level 2 3/21*2/18=2%, and the full model 3/21*2/18*2/16*2/15*2/13*2/12*2/10*2/9*2/7*2/6*2/4*2/4 = 10^-8. Levels 1 and 2 were a little smaller than with the size 4 alphabet, but the full model much smaller. I was expecting the probability of a visit to be about squared, but it was cubed.