Hrm, I’d have to say go with whichever is simpler (choose your favorite reasonable method of measuring the complexity of a hypothesis) for the usual reasons. (less bits to describe it means less stuff that has to be “just so”, etc etc… Of course, modify this a bit if one of the hypothesies has a significantly different prior than the other due to previously learned info, but...) But yeah, the less complex one that works is more likely to be closer to the underlying dynamic.
If you’re handed the two hypothesies as black boxes, so that you can’t actually see inside them and work out which is more complex, then go with the first one. The first one, since it’s more likely to be less complex (since maximum only the first ten data points could have been in some way explicitly hard coded into it. It successfully really predicted the next ten. The second one could, potentially, have in some way all twenty data points hard coded into it, and thus be more complex and thus effectively less likely to actually have anything resembling the underlying dynamic encoded into it)
Hrm, I’d have to say go with whichever is simpler (choose your favorite reasonable method of measuring the complexity of a hypothesis) for the usual reasons. (less bits to describe it means less stuff that has to be “just so”, etc etc… Of course, modify this a bit if one of the hypothesies has a significantly different prior than the other due to previously learned info, but...) But yeah, the less complex one that works is more likely to be closer to the underlying dynamic.
If you’re handed the two hypothesies as black boxes, so that you can’t actually see inside them and work out which is more complex, then go with the first one. The first one, since it’s more likely to be less complex (since maximum only the first ten data points could have been in some way explicitly hard coded into it. It successfully really predicted the next ten. The second one could, potentially, have in some way all twenty data points hard coded into it, and thus be more complex and thus effectively less likely to actually have anything resembling the underlying dynamic encoded into it)