Neel Nanda comments on Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

Neel Nanda 4 Dec 2022 12:54 UTC
LW: 2 AF: 1
0
AF
Thanks! Can you give a non-linear decomposition example?
- ryan_greenblatt 5 Dec 2022 1:50 UTC
  LW: 2 AF: 2
  1
  AF Parent
  I would typically call
  
  MLP(x) = f(x) + (MLP(x) - f(x))
  
  a non-linear decomposition as f(x) is an arbitrary function.
  
  Regardless, any decomposition into a computational graph (that we can prove is extensionally equal) is fine. For instance, if it’s the case that MLP(x) = combine(h(x), g(x)) (via extensional equality), then I can scrub h(x) and g(x) individually.
  
  One example of this could be a product, e.g, suppose that MLP(x) = h(x) * g(x) (maybe like swiglu or something).
- Kshitij Sachan 5 Dec 2022 20:51 UTC
  1 point
  0
  Parent
  We haven’t had to use a non-linear decomposition in our interp work so far at Redwood. Just wanted to point out that it’s possible. I’m not sure when you would want to use one, but I haven’t thought about it that much.