You should be deeply embarrassed if your model outputs an obviously wrong or obviously time-inconsistent answer even in a hypothetical situation.
Suppose you have a particle accelerator that goes up to half the speed of light. You notice an effect whereby faster particles become harder to accelerate.
You curve fit this effect and get that c√c2−v2. and 1+12v2c2+12v4c4 both fit the data, well the first one fits the data slightly better. However, when you test your formula on the case of a particle travelling at twice the speed of light, you get back nonsensical imaginary numbers. Clearly the real formula must be the second one. (The real formula is actually the first one)
A good model will often give a nonsensical answer when asked a nonsensical question, and nonsensical questions don’t always look nonsensical.
This is worth being explicit about. I took the advice as applying to hypothetical situations consistent with the intended usage of the model. A sports prediction model probably doesn’t need to work for low-gravity situations, most physics models don’t need to include FTL particles.
It would be nice to say more formally that you can fix it by improving the model OR by specifying which subset of the imagination space the model is intended for.
edit based on replies (thank you for making me think further): A LOT hinges on the meta-model and the theories behind your belief that the model is useful in the first place. It makes sense to be VERY skeptical of correlations found in data with no explanation of what it means. For these, I agree that any failure is grounds to reject. For models that start with a hypothesis, it makes a lot of sense to use the theory to identify reasonable exclusions, where you don’t throw out the model for weird results, because you don’t expect it to apply.
I was imagining doing this before the speed of light limit was known. In which case you can find yourself saying that the subset where the model produces sensible results is the subset of imagination space the model is intended for.
But to be fair, if you then fixed the model to output errors once you exceeded the speed of light, as the post recommends, you would have come up with a model that actually communicated a deep truth. There’s no reason a model has to be continuous, after all.
Suppose you have a particle accelerator that goes up to half the speed of light. You notice an effect whereby faster particles become harder to accelerate.
You curve fit this effect and get that c√c2−v2. and 1+12v2c2+12v4c4 both fit the data, well the first one fits the data slightly better. However, when you test your formula on the case of a particle travelling at twice the speed of light, you get back nonsensical imaginary numbers. Clearly the real formula must be the second one. (The real formula is actually the first one)
A good model will often give a nonsensical answer when asked a nonsensical question, and nonsensical questions don’t always look nonsensical.
This is worth being explicit about. I took the advice as applying to hypothetical situations consistent with the intended usage of the model. A sports prediction model probably doesn’t need to work for low-gravity situations, most physics models don’t need to include FTL particles.
It would be nice to say more formally that you can fix it by improving the model OR by specifying which subset of the imagination space the model is intended for.
edit based on replies (thank you for making me think further): A LOT hinges on the meta-model and the theories behind your belief that the model is useful in the first place. It makes sense to be VERY skeptical of correlations found in data with no explanation of what it means. For these, I agree that any failure is grounds to reject. For models that start with a hypothesis, it makes a lot of sense to use the theory to identify reasonable exclusions, where you don’t throw out the model for weird results, because you don’t expect it to apply.
I was imagining doing this before the speed of light limit was known. In which case you can find yourself saying that the subset where the model produces sensible results is the subset of imagination space the model is intended for.
But to be fair, if you then fixed the model to output errors once you exceeded the speed of light, as the post recommends, you would have come up with a model that actually communicated a deep truth. There’s no reason a model has to be continuous, after all.
Similarly, a linear regression to some data will often behave well for interpolation but less well for distant extrapolation.