Gurkenglas comments on Computing Natural Abstractions: Linear Approximation

Gurkenglas 15 Apr 2021 22:48 UTC
3 points
If matrix A maps each input vector of X to a vector of which the first entry corresponds to Y, subtracting multiples of the first row from every other row to make them orthogonal to the first row, then deleting the first row, would leave a matrix whose row space is the input vectors that keep Y at 0, and whose column space is the outputs thus still reachable. If you fix some distribution on the inputs of X (such as the normal distribution with a given covariance matrix), whether this is losslessly possible should be more interesting.
- tailcalled 16 Apr 2021 6:15 UTC
  1 point
  Parent
  Presumably you wouldn’t be able to figure out the precise value of Y since Y isn’t connected to X. You could only find an approximate estimate. Though on reflection the outputs are more interesting in a nonlinear graph (which was the context where I originally came up with the idea), since in a linear one all ways of modifying Y are equivalent.
  - johnswentworth 16 Apr 2021 15:20 UTC
    2 points
    Parent
    All ways of modifying Y are only equivalent in a dense linear system. Sparsity (in a high-dimensional system) changes that. (That’s a fairly central concept behind this whole project: sparsity is one of the main ingredients necessary for the natural abstraction hypothesis.)
    - tailcalled 16 Apr 2021 16:37 UTC
      1 point
      Parent
      I think I phrased it wrong/in a confusing way.
      Suppose Y is unidimensional, and you have Y=f(g(X), h(X)). Suppose there are two perturbations i and j that X can emit, where g is only sensitive to i and h is only sensitive to j, i.e. g(j)=0, h(i)=0. Then because the system is linear, you can extract them from the rest:
      Y=f(g(X+ai+bj), h(X+ai+bj))=f(g(X), h(X))+af(g(i))+bf(h(j))
      This means that if X only cares about Y, it is free to choose whether to adjust a or to adjust b. In a nonlinear system, there might be all sorts of things like moderators, diminishing returns, etc., which would make it matter whether it tried to control Y using a or using b; but in a linear system, it can just do whatever.
      - johnswentworth 16 Apr 2021 16:39 UTC
        2 points
        Parent
        Oh I see. Yeah, if either X or Y is unidimensional, then any linear model is really boring. They need to be high-dimensional to do anything interesting.
        tailcalled 16 Apr 2021 17:07 UTC
        1 point
        Parent
        They need to be high-dimensional for the linear models themselves to do anything interesting, but I think adding a large number of low-dimensional linear models might, despite being boring, still change the dynamics of the graphs to be marginally more realistic for settings involving optimization. X turns into an estimate of Y, and tries to control this estimate towards zero; that’s a pattern that I assume would be rare in your graph, but common in reality, and it could lead to real graphs exhibiting certain “conspiracies” that the model graphs might lack (especially if there are many (X, Y) pairs, or many (individually unidimensional) Xs that all try to control a single common Y).
        But there’s probably a lot of things that can be investigated about this. I should probably be working on getting my system for this working, or something. Gonna be exciting to see what else you figure out re natural abstractions.
  - Gurkenglas 16 Apr 2021 9:27 UTC
    2 points
    Parent
    Y can consist of multiple variables, and then there would always be multiple ways, right? I thought by indirect you meant that the path between X and Y was longer than 1. If some third cause is directly upstream from both, then I suppose it wouldn’t be uniquely defined whether changing X changes Y, since there could be directions in which to change the cause that change some subset of X and Y.
    - tailcalled 16 Apr 2021 10:00 UTC
      1 point
      Parent
      Y can consist of multiple variables, and then there would always be multiple ways, right?
      Not necessarily. For instance if X has only one output, then there’s only one way for X to change things, even if the one output connects to multiple Ys.
      I thought by indirect you meant that the path between X and Y was longer than 1.
      Yes.
      If some third cause is directly upstream from both, then I suppose it wouldn’t be uniquely defined whether changing X changes Y, since there could be directions in which to change the cause that change some subset of X and Y.
      I’m not sure I get it, or at least if I get it I don’t agree.
      Are you saying that if we’ve got X ← Z → Y and X → Y, then the effect of X on Y may not be well-defined, because it depends on whether the effect is through Z or not, as the Z → Y path becomes relevant when it is through Z?
      Because if so I think I disagree. The effect of X on Y should only count the X → Y path, not the X ← Z → Y path, as the latter is a confounder rather than a true causal path.