Cool work! I was going to post about how “effect cancellation” is already known and was written in the original post but, astonishingly to me, it is not! I guess I mis-remembered.
There’s one detail that I’m curious about. CaSc usually compares abs(E[loss] - E[scrubbed loss]), and that of course leads to ignoring hypotheses which lead the model to do better in some examples and worse in others.
If we compare E[abs(loss—scrubbed loss)] does this problem go away? I imagine that it doesn’t quite if there are exactly-opposing causes for each example, but that seems harder to happen in practice.
Cool work! I was going to post about how “effect cancellation” is already known and was written in the original post but, astonishingly to me, it is not! I guess I mis-remembered.
There’s one detail that I’m curious about. CaSc usually compares abs(E[loss] - E[scrubbed loss]), and that of course leads to ignoring hypotheses which lead the model to do better in some examples and worse in others.
If we compare E[abs(loss—scrubbed loss)] does this problem go away? I imagine that it doesn’t quite if there are exactly-opposing causes for each example, but that seems harder to happen in practice.
(There’s a section on this in the appendix but it’s rather controversial even among the authors)