Vladimir_M comments on Open Thread: July 2010

Vladimir_M 3 Jul 2010 21:46 UTC
7 points
0
First, let’s calculate the concrete probability numbers. If we are to trust this calculator, the probability of finding exactly 75 big fish in a sample of a hundred from a pond where 75% of the fish are big is approximately 0.09, while getting the same number in a sample from a 25% big pond has a probability on the order of 10^-25. The same numbers hold in the reverse situation, of course.

Now, Al and Bob have to consider two possible scenarios:
1. The fish are 75% big, Al got the decently probable ⁷⁵⁄₁₀₀ sample, but Bob happened to be the first scientist who happened to get the extremely improbable ²⁵⁄₁₀₀ sample, and there were likely 10^(twenty-something) or so scientists sampling before Bob.
2. The fish are 25% big, Al got the extremely improbable ⁷⁵⁄₁₀₀ big sample, while Bob got the decently probable ²⁵⁄₁₀₀ sample. This means that Bob is probably among the first few scientists who have sampled the pond.
So, let’s look at it from a frequentist perspective: if we repeat this game many times, what will be the proportion of occurrences in which each scenario takes place?

Here we need an additional critical piece of information: how exactly was Bob’s place in the sequence of scientists determined? At this point, an infinite number of scientists will give us lots of headache, so let’s assume it’s some large finite number N_sci, and Bob’s place in the sequence is determined by a random draw with probabilities uniformly distributed over all places in the sequence. And here we get an important intermediate result: assuming that at least one scientist gets to sample ²⁵⁄₁₀₀, the probability for Bob to be the first to sample ²⁵⁄₁₀₀ is independent of the actual composition of the pond! Think of it by means of a card-drawing analogy. If you’re in a group of 52 people whose names are repeatedly called out in random order to draw from a deck of cards, the proportion of drawings in which you get to be the first one to draw the ace of spades will always be ¹⁄₅₂, regardless of whether it’s a normal deck or a non-standard one with multiple aces of spades, or even a deck of 52 such aces!

Now compute the following probabilities:

P1 = p(75% big fish) * p(Al samples ⁷⁵⁄₁₀₀ | 75% big fish) * p(Bob gets to be the first to sample ²⁵⁄₁₀₀)
~ 0.5 0.09 1/N_sci

P2 = p(25% big fish) * p(Al samples ⁷⁵⁄₁₀₀ | 25% big fish) *p (Bob gets to be the first to sample ²⁵⁄₁₀₀)
~ 0.5 10^-25 1/N_sci

(We ignore the finite, but presumably negligible probabilities that no scientist samples ²⁵⁄₁₀₀ in either case; these can be made arbitrarily low by increasing N_sci.)

Therefore, we have P1 >> P2, i.e. the overwhelming majority of meetings between Al and Bob—which are by themselves extremely rare, since Al usually meets someone from the other (N_sci-1) scientists—happen under the first scenario, where Al gets a sample closely matching the actual ratio.

Now, you say:

It isn’t terribly clear why Bob should discount all of his observations, since they don’t seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.

Not really, when you consider repeating the experiment. For the overwhelming majority of repetitions, Bob will get results close to the actual ratio, and on rare occasions he’ll get extreme outlier samples. Those repetitions in which he gets summoned to meet with Al, however, are not a representative sample of his measurements! The criteria for when he gets to meet with Al are biased towards including a greater proportion of his improbable ²⁵⁄₁₀₀ outlier results.

As for this:

VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he’s going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.

I don’t think this is a well defined scenario. Answers will depend on the exact process by which this second observer gets selected. (Just like in the preceding discussion, the answer would be different if e.g. Bob had been always assigned the same place in the sequence of scientists.)
- utilitymonster 4 Jul 2010 12:06 UTC
  1 point
  0
  Parent
  I was assuming Charlie would show Bob the first person to see ⁷⁵⁄₁₀₀.
  
  Anyway, your analysis solves this as well. Being the first to see a particular result tells you essentially nothing about the composition of the pond (provided N_sci is sufficiently large that someone or other was nearly certain to see the result). Thus, each of Al and Bob should regard their previous observations as irrelevant once they learn that they were the first to get those results. Thus, they should just stick with their priors and be ⁵⁰⁄₅₀ about the composition of the pond.