jenny comments on A comparison of causal scrubbing, causal abstractions, and related methods

jenny 18 Jul 2023 21:37 UTC
LW: 6 AF: 4
0
AF
This is a nice comparison. I particularly like the images :) and drawing the comparisons setting aside historical accidents.
A few comments that came to mind as I was reading:
Perform an interchange intervention on the treeification of L such that the corresponding intervention in the treeification of H would not change any values.
As far as I saw, you don’t mention how causal scrubbing specifies selecting the interchange intervention (the answer is: preserving the distribution of inputs to nodes in H, see e.g. the Appendix post). I think this is an important point: causal scrubbing provides an opinion on which interventions you should do in order to judge your hypothesis, not just how you should do them.
We need some way of turning a neural network into a graph L, i.e. we need to decide what the individual nodes should be. We won’t discuss that problem in this post since it is orthogonal to the main algorithms we’re comparing.
I actually think this is reasonably relevant, and is related to treeification. Causal scrubbing encourages writing your graph in whatever way you want: there is no reason to think the “normal” network topology is privileged, e.g. that heads are the right unit of abstraction. For example, in causal scrubbing we frequently split the output of a head in different subspaces, or even write it as computing a function plus an error term.
TBC other methods could also operate on a rewritten, treeified graph, but they don’t encourage it and idk if authors/proponents would endorse.
Treeification is the one way in which causal scrubbing is stricter than all the other methods.
Related to the above comment: I actually don’t think of treefication as making it stricter, rather just more expressive. It allows you to write down a hypotheses from a richer space to reflect what you actually think the network is doing (e.g. head 0 in layer 0 is only relevant for head 5 in layer 1, otherwise it’s unimportant).
Recall that causal scrubbing only allows interventions that don’t change any of the values in the explanation H.
IMO this isn’t a fundamental property of causal scrubbing (I agree this isn’t mentioned anywhere, so you’re not wrong in pointing out this difference; but I also want to note which are the deepest differences and which are more of “no one has gotten around to writing up that extension yet”).
- Erik Jenner 23 Jul 2023 18:26 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Thanks! Mostly agree with your comments.
  I actually think this is reasonably relevant, and is related to treeification.
  I think any combination of {rewriting, using some canonical form} and {treeification, no treeification} is at least possible, and they all seem sort of reasonable. Do you mean the relation is that both rewriting and treeification give you more expressiveness/more precise hypotheses? If so, I agree for treeification, not sure for rewriting. If we allow literally arbitrary extensional rewrites, then that does increase the number of different hypotheses we can make, but these hypotheses can’t be understood as making precise claims about the original computation anymore. I could even see an argument that allowing rewrites in some sense always makes hypotheses less precise, but I feel pretty confused about what rewrites even are given that there might be no canonical topology for the original computation.
  - jenny 2 Aug 2023 19:25 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Not sure if I’m fully responding to your q but...
    there might be no canonical topology for the original computation
    This sounds right to me, and overall I mostly think of treeification as just a kind of extensional rewrite (plus adding more inputs).
    these hypotheses can’t be understood as making precise claims about the original computation anymore
    I think of the underlying graph as providing some combination of 1) causal relationships, and 2) smaller pieces to help with search/reasoning, rather than being an object we inherently care about. (It’s possibly useful to think of hypotheses more as making predictions about the behavior but idk.)
    I do agree that in some applications you might want to restrict which rewrites (including treeification!) are allowed. e.g., in MAD for ELK we might want to make use of the fact that there is a single “diamond” (which may be ~distributed, but not ~duplicated) upstream of all the sensors.