″ < Jaynes quote > … If Nature is one way, the likelihood of the data coming out the way we have seen will be one thing. If Nature is another way, the likelihood of the data coming out that way will be something else. But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.”
This seems to be wrong. EY makes a sort of dualistic distinction between “Nature” (with a capital “N”) and the researcher’s mental state. But what EY (and possibly Jaynes, though I can’t tell from a short quote) is missing is that the researcher’s mental state is part of Nature, and in particular is part of the stochastic processes that generate the data for these two different experimental settings. Therefore, any correct inference technique, frequentist or Bayesian, must treat the two scenarios differently.
The point that EY is making there is kind of subtle. Think about it this way:
There’s a hidden double selected uniformly at random that’s between 0 and 1. You can’t see what it is; you can only press a button to see a 1 if another randomly selected double (over the same range) is higher than it, or 0 if the new double is less than or equal to it.
One researcher says “I’m going to press this button 100 times, and then estimate what the hidden double is.” The second research says “I’m going to press this button until my estimate of the double is at most .4.” Coincidentally, they see the exact same sequence of 100 presses, with 70 1s.
The primary claim is that the likelihood ratio from seeing 70 1s and 30 0s is the same for both researchers, and this seems correct to me. (How can the researcher’s intention change the hidden double?) The secondary claim is that the second researcher receives no additional information from the potentially surprising fact that he required 100 presses under his decision procedure. I have not put enough thought into it to determine whether or not the secondary claim is correct, but it seems likely to me that it is.
Split the researchers that generate the data from the reasoner who is trying to estimate the hidden double from the data.
What is the data that the estimator receives? There is clearly a string of 100 bits indicating the results of the comparisons, but there is also another datum which indicates that the experiment was stopped after 100 iterations. This is a piece of evidence which must be included in the model, and the way to include it depends on the estimator’s knowledge of the stopping criterion used by the data generator.
The estimator has to take into account the possibility of cherry picking.
EDIT:
I think I can use an example:
Suppose that I give you N =~ 10^9 bits of data generated according to the process you describe, and I declare that I had precommitted to stop gathering data after exactly N bits. If you trust me, then you must believe that you have an extremely accurate estimate of the hidden double. After all, you are using 1 gigabit of data to estimate less than 64 bits of entropy!
But then you learn that I lied about the stopping criterion, and I had in fact precommitted to stop gathering data at the point that it would have fooled you into believing with very high probability that the hidden number was, say, 0.42.
Should you update your belief on the hidden double after hearing of my deception? Obviously you should. In fact, the observation that I gave you so much data now makes the estimate extremely suspect, since the more data I give you the more I can manipulate your estimate.
So, suppose I know the stopping criterion and the number of button presses that it took to stop the sequence, but I wasn’t given the actual sequence.
It seems to me like I can use the two of those to recreate the sequence, for a broad class of stopping criteria. “If it took 100 presses, then clearly it must be 70 1s and 30 0s, because if it had been 71 1s and 29 0s he would have stopped then and there would be only 99 presses, but he wouldn’t have stopped at 69 1s and 30 0s.” I don’t think I have any additional info.
Should you update your belief on the hidden double after hearing of my deception? Obviously you should.
Update it to what? Assuming that the data is not tampered with, just that your stopping criterion was pointed at a particular outcome, it seems like that unless the double is actually very close to 0.42 then you are very unlikely to ever stop!* It looks like the different stopping criteria impose conditions on the order of the dataset, but the order is independent of the process that generates whether each bit is a 1 or a 0, and thus should be independent of my estimate of the hidden double.
* If you imagine multiple researchers, each of which get different sequences, and I only hear from some of the researchers- then, yes, it seems like selection bias is a problem. But the specific scenario under consideration is two researchers with identical experimental results drawing different inferences from those results, which is different from two researchers with differing experimental setups having different distributions of possible results.
Different information about part of nature is not sufficient to change an inference—the probabilities could be independent of the researcher’s intentions.
The posterior probability of the observed data given the hidden variable of interest is in general not independent from the intentions of the researcher who is in charge of the data generation process.
″ < Jaynes quote > … If Nature is one way, the likelihood of the data coming out the way we have seen will be one thing. If Nature is another way, the likelihood of the data coming out that way will be something else. But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.”
This seems to be wrong.
EY makes a sort of dualistic distinction between “Nature” (with a capital “N”) and the researcher’s mental state. But what EY (and possibly Jaynes, though I can’t tell from a short quote) is missing is that the researcher’s mental state is part of Nature, and in particular is part of the stochastic processes that generate the data for these two different experimental settings. Therefore, any correct inference technique, frequentist or Bayesian, must treat the two scenarios differently.
The point that EY is making there is kind of subtle. Think about it this way:
There’s a hidden double selected uniformly at random that’s between 0 and 1. You can’t see what it is; you can only press a button to see a 1 if another randomly selected double (over the same range) is higher than it, or 0 if the new double is less than or equal to it.
One researcher says “I’m going to press this button 100 times, and then estimate what the hidden double is.” The second research says “I’m going to press this button until my estimate of the double is at most .4.” Coincidentally, they see the exact same sequence of 100 presses, with 70 1s.
The primary claim is that the likelihood ratio from seeing 70 1s and 30 0s is the same for both researchers, and this seems correct to me. (How can the researcher’s intention change the hidden double?) The secondary claim is that the second researcher receives no additional information from the potentially surprising fact that he required 100 presses under his decision procedure. I have not put enough thought into it to determine whether or not the secondary claim is correct, but it seems likely to me that it is.
Split the researchers that generate the data from the reasoner who is trying to estimate the hidden double from the data.
What is the data that the estimator receives? There is clearly a string of 100 bits indicating the results of the comparisons, but there is also another datum which indicates that the experiment was stopped after 100 iterations. This is a piece of evidence which must be included in the model, and the way to include it depends on the estimator’s knowledge of the stopping criterion used by the data generator.
The estimator has to take into account the possibility of cherry picking.
EDIT:
I think I can use an example:
Suppose that I give you N =~ 10^9 bits of data generated according to the process you describe, and I declare that I had precommitted to stop gathering data after exactly N bits. If you trust me, then you must believe that you have an extremely accurate estimate of the hidden double. After all, you are using 1 gigabit of data to estimate less than 64 bits of entropy!
But then you learn that I lied about the stopping criterion, and I had in fact precommitted to stop gathering data at the point that it would have fooled you into believing with very high probability that the hidden number was, say, 0.42.
Should you update your belief on the hidden double after hearing of my deception? Obviously you should. In fact, the observation that I gave you so much data now makes the estimate extremely suspect, since the more data I give you the more I can manipulate your estimate.
So, suppose I know the stopping criterion and the number of button presses that it took to stop the sequence, but I wasn’t given the actual sequence.
It seems to me like I can use the two of those to recreate the sequence, for a broad class of stopping criteria. “If it took 100 presses, then clearly it must be 70 1s and 30 0s, because if it had been 71 1s and 29 0s he would have stopped then and there would be only 99 presses, but he wouldn’t have stopped at 69 1s and 30 0s.” I don’t think I have any additional info.
Update it to what? Assuming that the data is not tampered with, just that your stopping criterion was pointed at a particular outcome, it seems like that unless the double is actually very close to 0.42 then you are very unlikely to ever stop!* It looks like the different stopping criteria impose conditions on the order of the dataset, but the order is independent of the process that generates whether each bit is a 1 or a 0, and thus should be independent of my estimate of the hidden double.
* If you imagine multiple researchers, each of which get different sequences, and I only hear from some of the researchers- then, yes, it seems like selection bias is a problem. But the specific scenario under consideration is two researchers with identical experimental results drawing different inferences from those results, which is different from two researchers with differing experimental setups having different distributions of possible results.
Different information about part of nature is not sufficient to change an inference—the probabilities could be independent of the researcher’s intentions.
The posterior probability of the observed data given the hidden variable of interest is in general not independent from the intentions of the researcher who is in charge of the data generation process.