So obviously as you mention, the whole thing about taking the infinite limits etc. is meant to be a hypothetical stand-in for doing things at a large scale. And similarly, obviously we don’t have finite observations, so it’s also an idealization there.
But this makes me think that perhaps some useful insights about the range of applicability for abstractions could be derived by thinking about convergence rates. E.g. if the information in an abstraction is spread among n variables, then each “layer” (e.g. Markov blanket, resampling) could be expected to introduce noise on the scale of n−1/2, so that seems to suggest that the abstraction is only valid up to a distance of around 1n−1/2=n1/2.
I’m not really sure how to translate this into practical use, because it seems like it would require some conversion factor between variable count and distance. I guess maybe one could translate it into a comparative rule, like “Getting X times more observations of a system should allow you to understand it in a √X times broader setting”, but this is probably a mixture of being too well-known, too abstract or too wrong to be useful.
But regardless of whether I’ll come up with any uses for this, I’d be curious if you or anyone else has any ideas here.
Yeah, the general problem of figuring out approximations and bounds for finite versions of these theorems has been a major focus for me over the past couple weeks, and will likely continue to be a major focus over the next month. Useful insights have already come out of that, and I expect more will come.
One thing I can’t help but think is:
So obviously as you mention, the whole thing about taking the infinite limits etc. is meant to be a hypothetical stand-in for doing things at a large scale. And similarly, obviously we don’t have finite observations, so it’s also an idealization there.
But this makes me think that perhaps some useful insights about the range of applicability for abstractions could be derived by thinking about convergence rates. E.g. if the information in an abstraction is spread among n variables, then each “layer” (e.g. Markov blanket, resampling) could be expected to introduce noise on the scale of n−1/2, so that seems to suggest that the abstraction is only valid up to a distance of around 1n−1/2=n1/2.
I’m not really sure how to translate this into practical use, because it seems like it would require some conversion factor between variable count and distance. I guess maybe one could translate it into a comparative rule, like “Getting X times more observations of a system should allow you to understand it in a √X times broader setting”, but this is probably a mixture of being too well-known, too abstract or too wrong to be useful.
But regardless of whether I’ll come up with any uses for this, I’d be curious if you or anyone else has any ideas here.
Yeah, the general problem of figuring out approximations and bounds for finite versions of these theorems has been a major focus for me over the past couple weeks, and will likely continue to be a major focus over the next month. Useful insights have already come out of that, and I expect more will come.