First, let’s calculate the concrete probability numbers. If we are to trust this calculator, the probability of finding exactly 75 big fish in a sample of a hundred from a pond where 75% of the fish are big is approximately 0.09, while getting the same number in a sample from a 25% big pond has a probability on the order of 10^-25. The same numbers hold in the reverse situation, of course.
Now, Al and Bob have to consider two possible scenarios:
The fish are 75% big, Al got the decently probable 75⁄100 sample, but Bob happened to be the first scientist who happened to get the extremely improbable 25⁄100 sample, and there were likely 10^(twenty-something) or so scientists sampling before Bob.
The fish are 25% big, Al got the extremely improbable 75⁄100 big sample, while Bob got the decently probable 25⁄100 sample. This means that Bob is probably among the first few scientists who have sampled the pond.
So, let’s look at it from a frequentist perspective: if we repeat this game many times, what will be the proportion of occurrences in which each scenario takes place?
Here we need an additional critical piece of information: how exactly was Bob’s place in the sequence of scientists determined? At this point, an infinite number of scientists will give us lots of headache, so let’s assume it’s some large finite number N_sci, and Bob’s place in the sequence is determined by a random draw with probabilities uniformly distributed over all places in the sequence. And here we get an important intermediate result: assuming that at least one scientist gets to sample 25⁄100, the probability for Bob to be the first to sample 25⁄100 is independent of the actual composition of the pond! Think of it by means of a card-drawing analogy. If you’re in a group of 52 people whose names are repeatedly called out in random order to draw from a deck of cards, the proportion of drawings in which you get to be the first one to draw the ace of spades will always be 1⁄52, regardless of whether it’s a normal deck or a non-standard one with multiple aces of spades, or even a deck of 52 such aces!
Now compute the following probabilities:
P1 = p(75% big fish) * p(Al samples 75⁄100 | 75% big fish) * p(Bob gets to be the first to sample 25⁄100) ~ 0.5 0.09 1/N_sci
P2 = p(25% big fish) * p(Al samples 75⁄100 | 25% big fish) *p (Bob gets to be the first to sample 25⁄100) ~ 0.5 10^-25 1/N_sci
(We ignore the finite, but presumably negligible probabilities that no scientist samples 25⁄100 in either case; these can be made arbitrarily low by increasing N_sci.)
Therefore, we have P1 >> P2, i.e. the overwhelming majority of meetings between Al and Bob—which are by themselves extremely rare, since Al usually meets someone from the other (N_sci-1) scientists—happen under the first scenario, where Al gets a sample closely matching the actual ratio.
Now, you say:
It isn’t terribly clear why Bob should discount all of his observations, since they don’t seem to subject to any observation selection effect; at least from his perspective, his observations were a genuine random sample.
Not really, when you consider repeating the experiment. For the overwhelming majority of repetitions, Bob will get results close to the actual ratio, and on rare occasions he’ll get extreme outlier samples. Those repetitions in which he gets summoned to meet with Al, however, are not a representative sample of his measurements! The criteria for when he gets to meet with Al are biased towards including a greater proportion of his improbable 25⁄100 outlier results.
As for this:
VARIANT: as before, but Charlie has a similar conversation with Bob. Only this time, he tells him he’s going to introduce Bob to someone who observed exactly 75 of 100 fish to be big.
I don’t think this is a well defined scenario. Answers will depend on the exact process by which this second observer gets selected. (Just like in the preceding discussion, the answer would be different if e.g. Bob had been always assigned the same place in the sequence of scientists.)
I was assuming Charlie would show Bob the first person to see 75⁄100.
Anyway, your analysis solves this as well. Being the first to see a particular result tells you essentially nothing about the composition of the pond (provided N_sci is sufficiently large that someone or other was nearly certain to see the result). Thus, each of Al and Bob should regard their previous observations as irrelevant once they learn that they were the first to get those results. Thus, they should just stick with their priors and be 50⁄50 about the composition of the pond.
First, let’s calculate the concrete probability numbers. If we are to trust this calculator, the probability of finding exactly 75 big fish in a sample of a hundred from a pond where 75% of the fish are big is approximately 0.09, while getting the same number in a sample from a 25% big pond has a probability on the order of 10^-25. The same numbers hold in the reverse situation, of course.
Now, Al and Bob have to consider two possible scenarios:
The fish are 75% big, Al got the decently probable 75⁄100 sample, but Bob happened to be the first scientist who happened to get the extremely improbable 25⁄100 sample, and there were likely 10^(twenty-something) or so scientists sampling before Bob.
The fish are 25% big, Al got the extremely improbable 75⁄100 big sample, while Bob got the decently probable 25⁄100 sample. This means that Bob is probably among the first few scientists who have sampled the pond.
So, let’s look at it from a frequentist perspective: if we repeat this game many times, what will be the proportion of occurrences in which each scenario takes place?
Here we need an additional critical piece of information: how exactly was Bob’s place in the sequence of scientists determined? At this point, an infinite number of scientists will give us lots of headache, so let’s assume it’s some large finite number N_sci, and Bob’s place in the sequence is determined by a random draw with probabilities uniformly distributed over all places in the sequence. And here we get an important intermediate result: assuming that at least one scientist gets to sample 25⁄100, the probability for Bob to be the first to sample 25⁄100 is independent of the actual composition of the pond! Think of it by means of a card-drawing analogy. If you’re in a group of 52 people whose names are repeatedly called out in random order to draw from a deck of cards, the proportion of drawings in which you get to be the first one to draw the ace of spades will always be 1⁄52, regardless of whether it’s a normal deck or a non-standard one with multiple aces of spades, or even a deck of 52 such aces!
Now compute the following probabilities:
P1 = p(75% big fish) * p(Al samples 75⁄100 | 75% big fish) * p(Bob gets to be the first to sample 25⁄100)
~ 0.5 0.09 1/N_sci
P2 = p(25% big fish) * p(Al samples 75⁄100 | 25% big fish) *p (Bob gets to be the first to sample 25⁄100)
~ 0.5 10^-25 1/N_sci
(We ignore the finite, but presumably negligible probabilities that no scientist samples 25⁄100 in either case; these can be made arbitrarily low by increasing N_sci.)
Therefore, we have P1 >> P2, i.e. the overwhelming majority of meetings between Al and Bob—which are by themselves extremely rare, since Al usually meets someone from the other (N_sci-1) scientists—happen under the first scenario, where Al gets a sample closely matching the actual ratio.
Now, you say:
Not really, when you consider repeating the experiment. For the overwhelming majority of repetitions, Bob will get results close to the actual ratio, and on rare occasions he’ll get extreme outlier samples. Those repetitions in which he gets summoned to meet with Al, however, are not a representative sample of his measurements! The criteria for when he gets to meet with Al are biased towards including a greater proportion of his improbable 25⁄100 outlier results.
As for this:
I don’t think this is a well defined scenario. Answers will depend on the exact process by which this second observer gets selected. (Just like in the preceding discussion, the answer would be different if e.g. Bob had been always assigned the same place in the sequence of scientists.)
I was assuming Charlie would show Bob the first person to see 75⁄100.
Anyway, your analysis solves this as well. Being the first to see a particular result tells you essentially nothing about the composition of the pond (provided N_sci is sufficiently large that someone or other was nearly certain to see the result). Thus, each of Al and Bob should regard their previous observations as irrelevant once they learn that they were the first to get those results. Thus, they should just stick with their priors and be 50⁄50 about the composition of the pond.