tailcalled comments on Computing Natural Abstractions: Linear Approximation

tailcalled 15 Apr 2021 22:11 UTC
5 points
Also, question, one way that you can get an abstraction of the neighbourhood is via SVD between the neighbourhood’s variables and other variables far away, but another way that you can do it is just by applying PCA on the variables in the neighbourhood itself. Does this yield the same result, or is there some difference? My guess would be that it would yield highly correlated variables to what you get from SVD.
Finally, intuitively from intuition on PCA, I would assume that the most important variable would tend to correspond to some sort of “average activation” in the neighbourhood, and the second most important variable would tend to correspond to the difference between the activation in the top of the neighbourhood and the activation in the bottom of the neighbourhood. Defining this precisely might be a bit hard, as the signs of nodes are presumably somewhat arbitrary, so one would have to come up with some way to adjust for that; but I would be curious if this looks anything like what you’ve found?
I guess in a way both of these hypotheses contradict your reasoning for why the natural abstraction hypothesis holds (though is quite compatible with it still holding), in the sense that your natural abstraction hypothesis is built on the idea that noisy interactions just outside of the neighbourhood wipe away lower-level details, while my intuitions are that a lot of the information needed for abstraction could be recovered from what is going on within the neighbourhood.
- johnswentworth 16 Apr 2021 15:10 UTC
  2 points
  Parent
  I’ve been thinking a fair bit about this question, though with a slightly different framing.
  Let’s start with a neighborhood $X$ . Then we can pick a bunch of different far-away neighborhoods $Y_{1}, . . ., Y_{n}$ , and each of them will contain some info about summary(X). But then we can flip it around: if we do a PCA on $(X, Y_{1}, . . ., Y_{n})$ , then we should expect to see components corresponding to summary(X), since all the variables which contain that information will covary.
  Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking). This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.
  - tailcalled 16 Apr 2021 17:01 UTC
    3 points
    Parent
    Switching back to your framing: if X itself is large enough to contain multiple far-apart chunks of variables, then a PCA on X should yield natural abstractions (roughly speaking).
    Agree, I was mainly thinking of whether it could still hold if X is small. Though it might be hard to define a cutoff threshold.
    Perhaps another way to frame it is, if you perform PCA, then you would likely get variables with info about both the external summary data of X, and the internal dynamics of X (which would not be visible externally). It might be relevant to examine things like the relative dimensionality for PCA vs SVD, to investigate how well natural abstractions allow throwing away info about internal dynamics.
    (This might be especially interesting a setting where it is tiled over time? As then the internal dynamics of X play a bigger role in things.)
    This is quite compatible with the idea of abstractions coming from noise wiping out info: even within X, noise prevents most info from propagating very far. The info which propagates far within X is also the info which is likely to propagate far beyond X itself; most other info is wiped out before it reaches the boundaries of X, and has low redundancy within X.
    🤔
    That makes me think of another tangential thing. In a designed system, noise can often be kept low, and redundancy is often eliminated. So the PCA method might work better on “natural” (random or searched) systems than on designed systems, while the SVD method might work equally well on both.