utilitymonster comments on Open Thread: July 2010

utilitymonster 3 Jul 2010 17:28 UTC
10 points
Here’s a puzzle I’ve been trying to figure out. It involves observation selection effects and agreeing to disagree. It is related to a paper I am writing, so help would be appreciated. The puzzle is also interesting in itself.

Charlie tosses a fair coin to determine how to stock a pond. If heads, it gets ³⁄₄ big fish and ¹⁄₄ small fish. If tails, the other way around. After Charlie does this, he calls Al into his office. He tells him, “Infinitely many scientists are curious about the proportion of fish in this pond. They are all good Bayesians with the same prior. They are going to randomly sample 100 fish (with replacement) each and record how many of them are big and how many are small. Since so many will sample the pond, we can be sure that for any n between 0 and 100, some scientist will observe that n of his 100 fish were big. I’m going to take the first one that sees 25 big and team him up with you, so you can compare notes.” (I don’t think it matters much whether infinitely many scientists do this or just 3^^^3.)

Okay. So Al goes and does his sample. He pulls out 75 big fish and becomes nearly certain that ³⁄₄ of the fish are big. Afterwards, a guy named Bob comes to him and tells him he was sent by Charlie. Bob says he randomly sampled 100 fish, 25 of which were big. They exchange ALL of their information.

Question: How confident should each of them be that ³⁄₄ of the fish are big?

Natural answer: Charlie should remain nearly certain that ¾ of the fish are big. He knew in advance that someone like Bob was certain to talk to him regardless of what proportion of fish were big. So he shouldn’t be the least bit impressed after talking to Bob.

But what about Bob? What should he think? At first glance, you might think he should be ⁵⁰⁄₅₀, since 50% of the fish he knows about have been big and his access to Al’s observations wasn’t subject to a selection effect. But that can’t be right, because then he would just be agreeing to disagree with Al! (This would be especially puzzling, since they have ALL the same information, having shared everything.) So maybe Bob should just agree with Al: he should be nearly certain that ¾ of the fish are big.

But that’s a bit odd. It isn’t terribly clear why Bob should discount all of his observations, since they don’t seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.

Things get weirder if we consider a variant of the case.

VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he’s going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.

New Question: Now what should Bob and Al think?

Here, things get really weird. By the reasoning that led to the Natural Answer above, Al should be nearly certain that ¾ are big and Bob should be nearly certain that ¼ are big. But that can’t be right. They would just be agreeing to disagree! (Which would be especially puzzling, since they have ALL the same information.) The idea that they should favor one hypothesis in particular is also disconcerting, given the symmetry of the case. Should they both be 50/50?

Here’s where I’d especially appreciate enlightenment: 1.If Bob should defer to Al in the original case, why? Can someone walk me through the calculations that lead to this?

2.If Bob should not defer to Al in the original case, is that because Al should change his mind? If so, what is wrong with the reasoning in the Natural Answer? If not, how can they agree to disagree?

3.If Bob should defer to Al in the original case, why not in the symmetrical variant?

4.What credence should they have in the symmetrical variant?

5.Can anyone refer me to some info on observation selection effects that could be applied here?
- Vladimir_M 3 Jul 2010 21:46 UTC
  7 points
  Parent
  First, let’s calculate the concrete probability numbers. If we are to trust this calculator, the probability of finding exactly 75 big fish in a sample of a hundred from a pond where 75% of the fish are big is approximately 0.09, while getting the same number in a sample from a 25% big pond has a probability on the order of 10^-25. The same numbers hold in the reverse situation, of course.
  
  Now, Al and Bob have to consider two possible scenarios:
  1. The fish are 75% big, Al got the decently probable ⁷⁵⁄₁₀₀ sample, but Bob happened to be the first scientist who happened to get the extremely improbable ²⁵⁄₁₀₀ sample, and there were likely 10^(twenty-something) or so scientists sampling before Bob.
  2. The fish are 25% big, Al got the extremely improbable ⁷⁵⁄₁₀₀ big sample, while Bob got the decently probable ²⁵⁄₁₀₀ sample. This means that Bob is probably among the first few scientists who have sampled the pond.
  So, let’s look at it from a frequentist perspective: if we repeat this game many times, what will be the proportion of occurrences in which each scenario takes place?
  
  Here we need an additional critical piece of information: how exactly was Bob’s place in the sequence of scientists determined? At this point, an infinite number of scientists will give us lots of headache, so let’s assume it’s some large finite number N_sci, and Bob’s place in the sequence is determined by a random draw with probabilities uniformly distributed over all places in the sequence. And here we get an important intermediate result: assuming that at least one scientist gets to sample ²⁵⁄₁₀₀, the probability for Bob to be the first to sample ²⁵⁄₁₀₀ is independent of the actual composition of the pond! Think of it by means of a card-drawing analogy. If you’re in a group of 52 people whose names are repeatedly called out in random order to draw from a deck of cards, the proportion of drawings in which you get to be the first one to draw the ace of spades will always be ¹⁄₅₂, regardless of whether it’s a normal deck or a non-standard one with multiple aces of spades, or even a deck of 52 such aces!
  
  Now compute the following probabilities:
  
  P1 = p(75% big fish) * p(Al samples ⁷⁵⁄₁₀₀ | 75% big fish) * p(Bob gets to be the first to sample ²⁵⁄₁₀₀)
  ~ 0.5 0.09 1/N_sci
  
  P2 = p(25% big fish) * p(Al samples ⁷⁵⁄₁₀₀ | 25% big fish) *p (Bob gets to be the first to sample ²⁵⁄₁₀₀)
  ~ 0.5 10^-25 1/N_sci
  
  (We ignore the finite, but presumably negligible probabilities that no scientist samples ²⁵⁄₁₀₀ in either case; these can be made arbitrarily low by increasing N_sci.)
  
  Therefore, we have P1 >> P2, i.e. the overwhelming majority of meetings between Al and Bob—which are by themselves extremely rare, since Al usually meets someone from the other (N_sci-1) scientists—happen under the first scenario, where Al gets a sample closely matching the actual ratio.
  
  Now, you say:
  
  It isn’t terribly clear why Bob should discount all of his observations, since they don’t seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.
  
  Not really, when you consider repeating the experiment. For the overwhelming majority of repetitions, Bob will get results close to the actual ratio, and on rare occasions he’ll get extreme outlier samples. Those repetitions in which he gets summoned to meet with Al, however, are not a representative sample of his measurements! The criteria for when he gets to meet with Al are biased towards including a greater proportion of his improbable ²⁵⁄₁₀₀ outlier results.
  
  As for this:
  
  VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he’s going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.
  
  I don’t think this is a well defined scenario. Answers will depend on the exact process by which this second observer gets selected. (Just like in the preceding discussion, the answer would be different if e.g. Bob had been always assigned the same place in the sequence of scientists.)
  - utilitymonster 4 Jul 2010 12:06 UTC
    1 point
    Parent
    I was assuming Charlie would show Bob the first person to see ⁷⁵⁄₁₀₀.
    
    Anyway, your analysis solves this as well. Being the first to see a particular result tells you essentially nothing about the composition of the pond (provided N_sci is sufficiently large that someone or other was nearly certain to see the result). Thus, each of Al and Bob should regard their previous observations as irrelevant once they learn that they were the first to get those results. Thus, they should just stick with their priors and be ⁵⁰⁄₅₀ about the composition of the pond.
- Blueberry 3 Jul 2010 17:38 UTC
  4 points
  Parent
  Interesting problem!
  
  (This would be especially puzzling, since they have ALL the same information, having shared everything.)
  
  It isn’t terribly clear why Bob should discount all of his observations, since they don’t seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.
  
  I think these two statements are inconsistent. If Bob is as certain as Al that Bob was picked specifically for his result, then they do have the same information, and they should both discount Bob’s observations to the same degree for that reason. If Bob doesn’t trust Al completely, they don’t have the same information. Bob doesn’t know for sure that Charlie told Al about the selection. From his point of view, Al could be lying.
  
  VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he’s going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.
  
  If Charlie tells both of them they were both selected, they have the same information (that both their observations were selected for that purpose, and thus give them no information) and they can only decide based on their priors about Charlie stocking the pond.
  
  If each of them only knows the other was selected and they both trust the other one’s statements, same thing. But if each puts more trust in Charlie than in the other, then they don’t have the same information.
  - prase 3 Jul 2010 18:42 UTC
    1 point
    Parent
    
    If Charlie tells both of them they were both selected, they have the same information (that both their observations were selected for that purpose, and thus give them no information) and they can only decide based on their priors about Charlie stocking the pond.
    
    It is strange. Shall Bob discount his observation after being told that he is selected? What does it actually mean to be selected? What if Bob finds 25 big fish and then Charlie tells him, that there are 3^^^3 other observers and he (Charlie) decided to “select” one of those who observe 25 big fish and talk to him, and that Bob himself is the selected one (no later confrontation with AI). Should this information cancel the Bob’s observations? If so, why?
    - Kingreaper 5 Jul 2010 14:16 UTC
      1 point
      Parent
      Yes, it should, if it is known that Charlie hasn’t previously “selected” any other people who got precisely 25.
      
      The probability of being selected (taken before you have found any fish) p[chosen] is approximately equal regardless of whether there are 25% or 75% big fish.
      
      And the probability of you being selected if you didn’t find 25 p[chosen|not25] is zero
      
      Therefore, the probability of you being selected, given as you have found 25 big fish p[chosen|found25] is approximately equal to p[chosen]/p[found25]
      
      The information of the fact you’ve been chosen directly cancels out the information from the fact you found 25 big fish.
    - utilitymonster 3 Jul 2010 19:11 UTC
      0 points
      Parent
      Glad to see we’re on the same page.
  - utilitymonster 3 Jul 2010 19:01 UTC
    0 points
    Parent
    I’m not sure about this:
    
    If Bob is as certain as Al that Bob was picked specifically for his result, then they do have the same information, and they should both discount Bob’s observations to the same degree for that reason.
    
    Here’s why:
    
    VARIANT 2: Charlie has both Al and Bob into his office before the drawings take place. He explains that the first guy (other than Al) to see ²⁵⁄₁₀₀ big will report to Al. Bob goes out and sees ²⁵⁄₁₀₀ big. To his surprise, he gets called into Charlie’s office and informed that he was the first to see that result.
    
    Question: right now, what should Bob expect to hear from Al?
    
    Intuitively, he should expect that Al had similar results. But if you’re right, it would seem that Bob should discount his results once he talks to Charlie and fights out that he is the messenger. If that’s right, he should have no idea what to expect Al to say. But that seems wrong. He hasn’t even heard anything from Al.
    
    If you’re still not convinced, consider:
    
    VARIANT 3: Charlie has both Al and Bob into his office before the drawings take place. He explains that the first guy (other than Al) to see ²⁵⁄₁₀₀ big will win a trip to Hawaii. Bob goes out and sees ²⁵⁄₁₀₀ big. To his surprise, he gets called into Charlie’s office and informed that he was the first to see that result.
    
    I can see no grounds for treating VARIANT 3 differently from VARIANT 2. And it is clear that in VARIANT 3 Bob should not discount his results.
- RobinZ 3 Jul 2010 18:10 UTC
  2 points
  Parent
  One key observation is that Al made his observation after being told that he would meet someone who made a particular observation—specifically, the first person to make that specific observation, Bob. This makes Al and Bob special in different ways:
  - Al is special because he has been selected to meet Bob regardless of what he observes. Therefore his data is genuinely uncorrelated with how he was selected for the meeting.
  - Bob is special because he has been selected to meet Al because of the specific data he observes. More precisely, because he will be the first to obtain that specific result. Therefore his result has been selected, and he is only at the meeting because he happens to be the first one to get that result.
  In the original case, Bob’s result is effectively a lottery ticket—when he finds out from Al the circumstances of the meeting, he can simply follow the Natural Answer himself and conclude that his results were unlikely.
  
  In the modified case, assuming perfect symmetry in all relevant aspects, they can conclude that an astronomically unlikely event has occurred and they have no net information about the contents of the pond.
  - utilitymonster 3 Jul 2010 18:47 UTC
    0 points
    Parent
    
    Al is special because he has been selected to meet Bob regardless of what he observes. Therefore his data is genuinely uncorrelated with how he was selected for the meeting.
    
    Not quite. He was selected to meet someone like Bob, in the sense that whoever the messenger was, he’d have seen ²⁵⁄₁₀₀ big. He didn’t know he’d meet Bob. But he regards the identity of the messenger as irrelevant.
    
    You can bring out the difference by considering a variant of the case in which both Al and Bob hear about Charlie’s plan in advance. (In this variant, the first to see ²⁵⁄₁₀₀ big will visit Al.)
    
    What is the relevance of the fact that they observed highly improbable event?
- Kingreaper 5 Jul 2010 13:56 UTC
  1 point
  Parent
  Okay, qualitative analysis without calculations:
  
  Let’s go for a large, finite, case. Because otherwise my brain will explode.
  
  Question 1: for any large, finite number of scientists Bob should defer MOSTLY to Alice.
  
  First lets look at Alice; In any large finite number of scientists there is a small finite chance that NO scientist will get that result. This chance is larger in the case where 75% of the fish are big. Thus, upon finding that a scientist HAS encountered 25 fish, Alice must adjust her probability slightly towards 25% big fish.
  
  Bob has also received several new pieces of information.
  
  *He was the first to find 25 big fish. P[first25|found25] approaches 1/P[found25] as you increase the number of scientists. This information almost entirely cancels out the information he already had.
  
  *All the information Alice had. This information therefore tips the scales.
  
  Bob’s final probability will be the same as Alice’s.
  
  Question two is N/A I will answer question three in a reply to this to try and avoid a massive wall of text.
  - Kingreaper 5 Jul 2010 14:01 UTC
    1 point
    Parent
    Question 3: lateral answer: in the symmetrical variant the issue of “how many people are being given other people to meet, and is this entire thing just a weird trick” begins to rise.
    
    In fact, the probability of it being a weird trick is going to overshadow almost any other attempt at analysis. The first person to get 25 happens to be a person who is told they will meet someone who got 75, and the person who was told they would meet the first person to get 25 happens to get 75? Massively improbable.
    
    However, if it is not a trick, the probability is significantly in favour of it being 75% still. Alice isn’t talking to Bob due to the fact she got 75, she’s talking to Bob due to the fact he got the first 25. Otherwise Bob would most likely have ended up talking to someone else.
    
    The proper response at this point for both Alice and Bob is to simply decide that it is overwhelming probable that Charlie is messing with them.
    
    I can produce similar variants which don’t have this issue, and they come out to 50:50. These include: Everyone is told that the first person to get 25 will meet the first person to get 75.
- Dagon 4 Jul 2010 1:38 UTC
  1 point
  Parent
  What is each of their prior probabilities for this setup being true? Bob, knowing that he was selected for his unusual results, can pretty happily disregard them. If you win a lottery, you don’t update to believe that most tickets win. Bob now knows of 100 samples (Al’s) that relate to the prior, and accepts them. Bob’s sampling is of a different prior: coin flipped, then a specific resulting sample will be found.
  
  If they are both selected for their results, they both go to ⁵⁰⁄₅₀. Neither one has non-selected samples.
- prase 3 Jul 2010 18:34 UTC
  1 point
  Parent
  Is there any particular reason why one of the actors is an AI?
  - utilitymonster 3 Jul 2010 18:42 UTC
    2 points
    Parent
    Al, not AI. (“Al” as in “Alan”)
    - prase 3 Jul 2010 18:49 UTC
      5 points
      Parent
      Sorry. I have some Lesswrong bias.
      
      Google statistics on Less Wrong:
      
      AI (second i): 2400 hits
      Al (second L): 318 hits (mostly in “et al.” and “al Qaida”, without capital A)
      
      By the way, are these two strings distinguishable when written in the font of this site? Seem to me the same.
      - RobinZ 3 Jul 2010 18:57 UTC
        2 points
        Parent
        You’re right—they’re pixel-for-pixel identical. That’s a bit problematic.
        Douglas_Knight 4 Jul 2010 4:32 UTC
        1 point
        Parent
        Maybe that’s why cryptographers say “Alice” rather than “Al.”
- JGWeissman 3 Jul 2010 18:22 UTC
  1 point
  Parent
  From Bob’s perspective, he was more likely to be chosen as the one to talk to Al, if there are fewer scientist that observed exactly 25 big fish, which would happen if there are more big fish. So Bob should update on the evidence of being chosen.
  - utilitymonster 3 Jul 2010 19:45 UTC
    0 points
    Parent
    This should be important to the finite case. The probability of being the first to see ²⁵⁄₁₀₀ is WAY higher (x 10^25 or so) if the lake is ³⁄₄ full of big fish than if it is ¹⁄₄ full of big fish.
    
    But in the infinite case the probability of being first is 0 either way...
    - JGWeissman 3 Jul 2010 20:51 UTC
      4 points
      Parent
      There is a reason we consider infinities only as limits of sequences of finite quantities.
      
      Suppose you tried to sum the log-odds evidence of the infinite scientist that the pond has more big fish. Well, some of them have positive evidence (summing to positive infinity), some have negative evidence (summing to negative infinity), and you can, by choosing the order of summation, get any result you want (up to some granularity) between negative and positive infinity.
      
      You don’t need anthropomorphic tricks to make things weird if you have actual infinities in the problem.
    - Vladimir_M 4 Jul 2010 4:53 UTC
      1 point
      Parent
      utilitymonster:
      
      The probability of being the first to see ²⁵⁄₁₀₀ is WAY higher (x 10^25 or so) if the lake is ³⁄₄ full of big fish than if it is ¹⁄₄ full of big fish.
      
      Maybe I’m misunderstanding your phrasing here, but it sounds fallacious. If there’s a deck of cards and you’re in a group of 52 people who are called out in random order and told to pick one card each from the deck, the probability of being the first person to draw an ace is exactly the same (1/52) regardless of whether it’s a normal deck or a deck of 52 aces (or even a deck with 3 out of 4 aces replaced by other cards). This result doesn’t even depend on whether the card is removed or returned into the deck after each person’s drawing; the conclusion follows purely from symmetry. The only special case is when there are zero aces, in which the event becomes impossible, with p=0.
      
      Similarly, if the order in which the scientists get their samples is shuffled randomly, and we ignore the improbable possibility that nobody sees ²⁵⁄₁₀₀, then purely by symmetry, the probability that Bob happens to be the first one to see ²⁵⁄₁₀₀ is the same regardless of the actual frequency of the ²⁵⁄₁₀₀ results: p = 1/N(scientists).
      - utilitymonster 4 Jul 2010 11:47 UTC
        1 point
        Parent
        You’re right, thanks.
        
        I was considering an example with 10^100 scientists. I thought that since there would be a lot more scientists who got 25 big in the ¹⁄₄ scenario than in the ³⁄₄ scenario (about 9.18 10^98 vs. 1.279 10^75), you’d be more likely to be first the ³⁄₄ scenario. But this forgets about the probability of getting an improbable result.
        
        In general, if there are N scientists, and the probability of getting some result is p, then we can expect Np scientists to get that result on average. If the order is shuffled as you suggest, then the probability of being the first to get that result is p * 1/(Np) = 1/N. So the probability of being the first to get the result is the same, regardless of the likelihood of the result (assuming someone will get the result).
        
        EDIT: It occurs to me that I might have been thinking about the probability of being selected by Al conditional on getting ²⁵⁄₁₀₀. In that case, you’re a lot more likely to be selected if the pond is ³⁄₄ big than if it is ¹⁄₄ big, since WAY more people got similar results in the latter case. JGMWeissman was probably thinking the same.
  - utilitymonster 3 Jul 2010 19:02 UTC
    0 points
    Parent
    What effect will updating on this information have?
- Soki 3 Jul 2010 21:07 UTC
  0 points
  Parent
  First off all, I think that if Al does not see a sample, it makes the problem a bit simpler. That is, Al just tells Bob that he (Bob) is the first person that saw 25 big fishes.
  
  I think that the number N of scientists matters, because the probability that someone will come to see Al depends on that.
  
  Lets call B then event the lake has 75% big fishes, S the opposite and C the event someone comes, which means that someone saw 25 fishes.
  
  Once Al sees Bob, he updates :
  P(B/C)=P(B)* P(C/B)/(1/2*P(C/B)+1/2*P(C/S)).
  When N tends toward infinity, both P(C/B) and P(C/S) tend toward 1, and P(B/SC) tends to ¹⁄₂.
  But for small values of N, P(C/B) can be very small while P(C/S) will be quite close to 1.
  Then the fact that someone was chosen lowers the probability of having a lake with big fishes.
  
  If N=infinity, then the probability of being chosen is 0, and I cannot use Bayes’ theorem.
  
  If Charlie keeps inviting scientists until one sees 25 big fishes, then it becomes complicated, because the probability that you are invited is greater if the lake has more big fishes. It may be a bit like the sleeping beauty or the absent-minded driver problem.
  
  Edited for formatting and misspellings