I wonder if your more-detailed model could be included in a derivation like that in the post above. The post assumes that every observation the model has (the previous y values) is correct. Your idea of mislabelling’s or sub-perfect observations might be includable as some rule that says y’s have an X% chance of just being wrong.
We can imagine two similar models. [1] a “zoomed in” model consists of two parts, first a model for the real-world, and second a model of observation errors. [2] A “zoomed out” model that just combines the real world and the observation errors and tries to fit to that data. If [2] sees errors then the model is tweaked to predict errors. Equivalently in the maths, but importantly different in spirit is model [1] which when it encounters an obvious error does not update the world model, but might update the model of the observation errors.
My feeling is that some of this “overfitting” discussion might be fed by people intuitively wanting the model to do [1], but actually building or studying the much simpler [2]. When [2] tries to include observation errors into the same map it uses to describe the world we cry “overfitting”.
I wonder if your more-detailed model could be included in a derivation like that in the post above. The post assumes that every observation the model has (the previous y values) is correct. Your idea of mislabelling’s or sub-perfect observations might be includable as some rule that says y’s have an X% chance of just being wrong.
We can imagine two similar models. [1] a “zoomed in” model consists of two parts, first a model for the real-world, and second a model of observation errors. [2] A “zoomed out” model that just combines the real world and the observation errors and tries to fit to that data. If [2] sees errors then the model is tweaked to predict errors. Equivalently in the maths, but importantly different in spirit is model [1] which when it encounters an obvious error does not update the world model, but might update the model of the observation errors.
My feeling is that some of this “overfitting” discussion might be fed by people intuitively wanting the model to do [1], but actually building or studying the much simpler [2]. When [2] tries to include observation errors into the same map it uses to describe the world we cry “overfitting”.