Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking).
Agree, I was mainly thinking of whether it could still hold if X is small. Though it might be hard to define a cutoff threshold.
Perhaps another way to frame it is, if you perform PCA, then you would likely get variables with info about both the external summary data of X, and the internal dynamics of X (which would not be visible externally). It might be relevant to examine things like the relative dimensionality for PCA vs SVD, to investigate how well natural abstractions allow throwing away info about internal dynamics.
(This might be especially interesting a setting where it is tiled over time? As then the internal dynamics of X play a bigger role in things.)
This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.
🤔
That makes me think of another tangential thing. In a designed system, noise can often be kept low, and redundancy is often eliminated. So the PCA method might work better on “natural” (random or searched) systems than on designed systems, while the SVD method might work equally well on both.
Agree, I was mainly thinking of whether it could still hold if X is small. Though it might be hard to define a cutoff threshold.
Perhaps another way to frame it is, if you perform PCA, then you would likely get variables with info about both the external summary data of X, and the internal dynamics of X (which would not be visible externally). It might be relevant to examine things like the relative dimensionality for PCA vs SVD, to investigate how well natural abstractions allow throwing away info about internal dynamics.
(This might be especially interesting a setting where it is tiled over time? As then the internal dynamics of X play a bigger role in things.)
🤔
That makes me think of another tangential thing. In a designed system, noise can often be kept low, and redundancy is often eliminated. So the PCA method might work better on “natural” (random or searched) systems than on designed systems, while the SVD method might work equally well on both.