johnswentworth comments on Computing Natural Abstractions: Linear Approximation

johnswentworth 16 Apr 2021 15:10 UTC
2 points
I’ve been thinking a fair bit about this question, though with a slightly different framing.
Let’s start with a neighborhood $X$ . Then we can pick a bunch of different far-away neighborhoods $Y_{1}, . . ., Y_{n}$ , and each of them will contain some info about summary(X). But then we can flip it around: if we do a PCA on $(X, Y_{1}, . . ., Y_{n})$ , then we should expect to see components corresponding to summary(X), since all the variables which contain that information will covary.
Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking). This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.
- tailcalled 16 Apr 2021 17:01 UTC
  3 points
  Parent
  Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking).
  Agree, I was mainly thinking of whether it could still hold if X is small. Though it might be hard to define a cutoff threshold.
  Perhaps another way to frame it is, if you perform PCA, then you would likely get variables with info about both the external summary data of X, and the internal dynamics of X (which would not be visible externally). It might be relevant to examine things like the relative dimensionality for PCA vs SVD, to investigate how well natural abstractions allow throwing away info about internal dynamics.
  (This might be especially interesting a setting where it is tiled over time? As then the internal dynamics of X play a bigger role in things.)
  This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.
  🤔
  That makes me think of another tangential thing. In a designed system, noise can often be kept low, and redundancy is often eliminated. So the PCA method might work better on “natural” (random or searched) systems than on designed systems, while the SVD method might work equally well on both.