And I agree that such a method should settle on the simplest theory among all candidate theories. It should implement Occam’s razor.
It’s not quite that simple in practice. There’s a tradeoff here, between accuracy in retrospect and theory simplicity. The two extreme pathological cases are:
You demand absolute accuracy in retrospect, i.e. P(observed data | hypothesis) = 1. This is the limit case of overfitting, and yields a GLUT, which makes no or completely useless predictions about the future.
You demand maximum simplicity. This is the limit case of underfitting, and yields a maximum-entropy distribution.
You want something inbetween those cases. I don’t know where exactly, but you would have to figure out some way to determine that point if you were, say, building an AGI.
I can’t really follow your earlier post. Specifically, I can’t parse your use of ” predicts ”, which you seem to use as a boolean value. But theories don’t “predict” or “not predict” outcomes in any absolute sense, they just assign probabilities to outcomes. Please explain your use of the phrase.
I can’t really follow your earlier post. Specifically, I can’t parse your use of ” predicts ”, which you seem to use as a boolean value. But theories don’t “predict” or “not predict” outcomes in any absolute sense, they just assign probabilities to outcomes. Please explain your use of the phrase.
Sorry, the earlier post was in the context of a toy problem in which predictions were boolean. I should have mentioned that. (I had made this assumption explicit in an earlier comment.)
My argument shows that, in the limiting case of boolean predictions, we should trust successful theories constructed using a subset of the data over theories constructed using all the data, even if all the theories were constructed using Occam’s razor. This at least strongly suggests the same possibility in more realistic cases where the theories assign probability distributions.
Ok, I think I get your earlier post now. I think you might be overcomplicating things here.
Sure, if you’re not confident what the correct simplicity prior is, you can get real evidence about which theory is likely to be stronger by observing things like their ability to correctly predict the outcome of new experiments. And to the extent that this tells you something about the way the originating scientist generates theories, there should even be some shifting of probability mass regarding the power of other theories proiduced by the same scientist. But that’s quite a lot of indirection, and there’s significant unknown factors that will dilute these shifts.
Attempting this is somewhat like trying to estimate the probability of a scientist being right about a famous problem in their field based on their prestige. There’s a signal, but it’s quite noisy.
If you know what simplicity looks like (and of course that’s uncomputable, but you can always approximate) - and how much it’s worth in terms of probability mass—you can make a much better guess as to which hypothesis is stronger by just looking at the actual hypotheses.
Looking at things like “how many experimental results did this hypothesis actually predict correctly” is only informative to the extent that your understanding of simplicity and its value is lacking. Note that the phrase lacking understanding of simplicity isn’t meant to be especially disparaging; good understanding of simplicity is hard. There’s a reason the scientific process includes an inelegant workaround instead.
It’s not quite that simple in practice. There’s a tradeoff here, between accuracy in retrospect and theory simplicity. The two extreme pathological cases are:
You demand absolute accuracy in retrospect, i.e. P(observed data | hypothesis) = 1. This is the limit case of overfitting, and yields a GLUT, which makes no or completely useless predictions about the future.
You demand maximum simplicity. This is the limit case of underfitting, and yields a maximum-entropy distribution.
You want something inbetween those cases. I don’t know where exactly, but you would have to figure out some way to determine that point if you were, say, building an AGI.
I can’t really follow your earlier post. Specifically, I can’t parse your use of ” predicts ”, which you seem to use as a boolean value. But theories don’t “predict” or “not predict” outcomes in any absolute sense, they just assign probabilities to outcomes. Please explain your use of the phrase.
Sorry, the earlier post was in the context of a toy problem in which predictions were boolean. I should have mentioned that. (I had made this assumption explicit in an earlier comment.)
My argument shows that, in the limiting case of boolean predictions, we should trust successful theories constructed using a subset of the data over theories constructed using all the data, even if all the theories were constructed using Occam’s razor. This at least strongly suggests the same possibility in more realistic cases where the theories assign probability distributions.
Ok, I think I get your earlier post now. I think you might be overcomplicating things here.
Sure, if you’re not confident what the correct simplicity prior is, you can get real evidence about which theory is likely to be stronger by observing things like their ability to correctly predict the outcome of new experiments. And to the extent that this tells you something about the way the originating scientist generates theories, there should even be some shifting of probability mass regarding the power of other theories proiduced by the same scientist. But that’s quite a lot of indirection, and there’s significant unknown factors that will dilute these shifts.
Attempting this is somewhat like trying to estimate the probability of a scientist being right about a famous problem in their field based on their prestige. There’s a signal, but it’s quite noisy.
If you know what simplicity looks like (and of course that’s uncomputable, but you can always approximate) - and how much it’s worth in terms of probability mass—you can make a much better guess as to which hypothesis is stronger by just looking at the actual hypotheses.
Looking at things like “how many experimental results did this hypothesis actually predict correctly” is only informative to the extent that your understanding of simplicity and its value is lacking. Note that the phrase lacking understanding of simplicity isn’t meant to be especially disparaging; good understanding of simplicity is hard. There’s a reason the scientific process includes an inelegant workaround instead.