Peter, your point that we have different beliefs about the theories prior to looking at them is helpful. AFAICT theories don’t screen off theorists, though. My belief that the college baseball team will score at least one point in every game (“theory A”), including the next one (“experiment 21″), may reasonably be increased by a local baseball expert telling me so and by evidence about his expertise. This holds even if I independently know something about baseball.
As to the effect of “number of parameters” on the theories’ probabilities, would you bet equally on the two theories if you were told that they contained an identical number of parameters? I wouldn’t, given the asymmetric information contained in the two experts vouching for the theories.
Tim, I agree that if you remove the distinct scientists and have the hypotheses produced instead by a single process (drawn from the same bucket), you should prefer whichever prediction has the highest prior probability. Do you mean that the prior probability is equal to the prediction’s simplicity or just that simplicity is a good rule of thumb in assigning prior probabilities? If we have some domain knowledge I don’t see why simplicity should correspond exactly to our priors; even Solomonoff Inducers move away from their initial notion of simplicity with increased data. (You’ve studied that math and I haven’t; is there a non-trivial updated-from-data notion of “simplicity” that has identical ordinal structure to an updated Solomonoff Inducer’s prior?)
Tyrrell, I like your solution a lot. A disagreement anyhow: as you say, if experts 1 and 2 are good probability theorists, T1 will contain the most likely predictions given the experimental results according to Expert 1 and T2 likewise according to Expert 2. Still, if the experts have different starting knowledge and at least one cannot see the other’s predictions, I don’t see anything that surprising in their “highest probability predictions given the data” calculations disagreeing with one another. This part isn’t in disagreement with you, but it also relevant that if the space of outcomes is small or if experiments 1-20 are part of some local regime that experiment 21 is not(e.g. physics at macroscopic scales, or housing prices before the bubble broke), it may not be surprising to see two theories that agree on a large body of data and diverge elsewhere. Theories that agree in one regime and disagree in others seem relatively common.
Alex, Bertil, and others, I may be wrong, but I think we should taboo “overfitting” and “ad hoc” for this problem and substitute mechanistic, probability-theory-based explanations for where phenomena like “overfitting” come from.
Peter, your point that we have different beliefs about the theories prior to looking at them is helpful. AFAICT theories don’t screen off theorists, though. My belief that the college baseball team will score at least one point in every game (“theory A”), including the next one (“experiment 21″), may reasonably be increased by a local baseball expert telling me so and by evidence about his expertise. This holds even if I independently know something about baseball.
As to the effect of “number of parameters” on the theories’ probabilities, would you bet equally on the two theories if you were told that they contained an identical number of parameters? I wouldn’t, given the asymmetric information contained in the two experts vouching for the theories.
Tim, I agree that if you remove the distinct scientists and have the hypotheses produced instead by a single process (drawn from the same bucket), you should prefer whichever prediction has the highest prior probability. Do you mean that the prior probability is equal to the prediction’s simplicity or just that simplicity is a good rule of thumb in assigning prior probabilities? If we have some domain knowledge I don’t see why simplicity should correspond exactly to our priors; even Solomonoff Inducers move away from their initial notion of simplicity with increased data. (You’ve studied that math and I haven’t; is there a non-trivial updated-from-data notion of “simplicity” that has identical ordinal structure to an updated Solomonoff Inducer’s prior?)
Tyrrell, I like your solution a lot. A disagreement anyhow: as you say, if experts 1 and 2 are good probability theorists, T1 will contain the most likely predictions given the experimental results according to Expert 1 and T2 likewise according to Expert 2. Still, if the experts have different starting knowledge and at least one cannot see the other’s predictions, I don’t see anything that surprising in their “highest probability predictions given the data” calculations disagreeing with one another. This part isn’t in disagreement with you, but it also relevant that if the space of outcomes is small or if experiments 1-20 are part of some local regime that experiment 21 is not(e.g. physics at macroscopic scales, or housing prices before the bubble broke), it may not be surprising to see two theories that agree on a large body of data and diverge elsewhere. Theories that agree in one regime and disagree in others seem relatively common.
Alex, Bertil, and others, I may be wrong, but I think we should taboo “overfitting” and “ad hoc” for this problem and substitute mechanistic, probability-theory-based explanations for where phenomena like “overfitting” come from.