Turn the question on it’s head and make up a story where the math matches the observation in every circumstance. If you can, and I’m not sure I could, work backwards from there to find the breaking point. Or just remember that the map is not the territory, the finger that points to the moon is not the moon, and get on with things.
This was my attempt to make up a story where the math would match something real:
Statistically comparing two samples of equids would make some sense if Dr. Yagami had sampled 2987 horses and 8 zebras while Dr. Eru had sampled 2995 horses and 0 zebras. Then Fisher’s exact test could tell us that they did, with high probability, not sample the same population with the same methods.
But in the actual case what we have is just a “virtual sample”. I’m wondering if there are any conceivable circumstances where a virtual sample would make sense.
How about the classic example of testing whether a coin is biased? This seems to use “virtual sample” as described in the original post to reflect the hypothesised state of affairs in which the coin is fair: P(heads) = P(tails) = 0.5. This can be simulated without a coin (whatever number of samples one wishes) then compared against observed counts of heads vs tails of the coin in question.
The same applies for any other situation where there is a theoretically derived prediction about probabilities to be tested (for example, “is my multiple choice exam so hard that students are not performing above chance?” If there are four choices we can test against a hypothetical P=.25).
But there you have a probabilistically formulated null hypothesis (coin is fair, students perform at chance level). In the equids example, the null hypothesis is that the probability of sampling a zebra is 0, which is disproven by simply pointing out that you, in fact, sampled some zebras. It makes no sense to calculate a p-value.
I have no idea what Fisher’s test is supposed to do here. Show a correlation between the property of being a zebra and the property of being in the real, as opposed to the imaginary, sample? … That’s meaningless.
Agreed! Perhaps Fisher’s test was used because it can deal with small expected values in cells of contingency tables (where chi-square is flawed) but “small” must still > 0.
Which just made me think that it would have been hilarious if Dr. Yagami had realised this and continued by saying that because of it, and in order to make the statistical test applicable, he is going to add an amount of random noise.
I don’t think that there’s any examination using a statistical test that uses a virtual sample that can’t be done as well or better with another statistical test. The whole point of Fisher’s is that you have four samples from an unknown distribution. If you pretend that there is a distribution that is unknown under the null that is in fact known under the null, you are throwing information away.
Turn the question on it’s head and make up a story where the math matches the observation in every circumstance. If you can, and I’m not sure I could, work backwards from there to find the breaking point. Or just remember that the map is not the territory, the finger that points to the moon is not the moon, and get on with things.
This was my attempt to make up a story where the math would match something real:
But in the actual case what we have is just a “virtual sample”. I’m wondering if there are any conceivable circumstances where a virtual sample would make sense.
How about the classic example of testing whether a coin is biased? This seems to use “virtual sample” as described in the original post to reflect the hypothesised state of affairs in which the coin is fair: P(heads) = P(tails) = 0.5. This can be simulated without a coin (whatever number of samples one wishes) then compared against observed counts of heads vs tails of the coin in question.
The same applies for any other situation where there is a theoretically derived prediction about probabilities to be tested (for example, “is my multiple choice exam so hard that students are not performing above chance?” If there are four choices we can test against a hypothetical P=.25).
But there you have a probabilistically formulated null hypothesis (coin is fair, students perform at chance level). In the equids example, the null hypothesis is that the probability of sampling a zebra is 0, which is disproven by simply pointing out that you, in fact, sampled some zebras. It makes no sense to calculate a p-value.
I have no idea what Fisher’s test is supposed to do here. Show a correlation between the property of being a zebra and the property of being in the real, as opposed to the imaginary, sample? … That’s meaningless.
Agreed! Perhaps Fisher’s test was used because it can deal with small expected values in cells of contingency tables (where chi-square is flawed) but “small” must still > 0.
Which just made me think that it would have been hilarious if Dr. Yagami had realised this and continued by saying that because of it, and in order to make the statistical test applicable, he is going to add an amount of random noise.
I don’t think that there’s any examination using a statistical test that uses a virtual sample that can’t be done as well or better with another statistical test. The whole point of Fisher’s is that you have four samples from an unknown distribution. If you pretend that there is a distribution that is unknown under the null that is in fact known under the null, you are throwing information away.