Yeah I think you need some additional assumptions on the models and behaviors, which you’re gesturing at with the “matching behaviors” and “inexact descriptions”. Otherwise it’s easy to find counterexamples: imagine the model is just a single N x N matrix of parameters, then in general there is no shorter description length of the behavior than the model itself.
Yes there are non-invertible (you might say “simpler”) behaviors which each occupy more parameter volume than any given invertible behavior, but random matrices are almost certainly invertible so the actual optimization pressure towards low description length is infinitesimal.
Yeah I think you need some additional assumptions on the models and behaviors, which you’re gesturing at with the “matching behaviors” and “inexact descriptions”. Otherwise it’s easy to find counterexamples: imagine the model is just a single N x N matrix of parameters, then in general there is no shorter description length of the behavior than the model itself.
Yes there are non-invertible (you might say “simpler”) behaviors which each occupy more parameter volume than any given invertible behavior, but random matrices are almost certainly invertible so the actual optimization pressure towards low description length is infinitesimal.