johnswentworth comments on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth 4 Jun 2022 14:29 UTC
6 points
We can do that with neural nets too. Although the right way to intervene in e.g. transformers seems to be low-rank adustments rather than single-node do-ops; see this paper (which is really cool!). And in systems big enough for abstraction to be meaningful, single-node-level interventions are also not the right way to intervene on high-level things, even in probabilistic causal models.
- tailcalled 4 Jun 2022 16:46 UTC
  5 points
  Parent
  
  We can do that with neural nets too.
  
  I don’t mean generative models as in contrast to neural nets; there are generative neural network models too, e.g. GANs.
  
  The thing is that generative models allow you to not just see the effects of the variable on the decision that gets made, but also on the observations that were generated. So if your observations are visual as in e.g. image models, then you can see what part of the image the latent variable you’re intervening on.
  
  And in systems big enough for abstraction to be meaningful, single-node-level interventions are also not the right way to intervene on high-level things, even in probabilistic causal models.
  
  This is true, but it’s also an example of something that would be more obvious in generative models. If you do a single-node-level intervention in an image generative model, you might see it only affect a small part of the image, or bring the image off-distribution, or similar, which would show it to not represent the high-level things.
  
  Although the right way to intervene in e.g. transformers seems to be low-rank adustments rather than single-node do-ops; see this paper (which is really cool!).
  
  Thanks for the link! I really should get around to reading it 😅

johnswentworth comments on Deep Learning Systems Are Not Less Interpretable Than Logic/​Probability/​Etc

johnswentworth comments on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc