If you want to get attributions between all pairs of basis elements/features in two layers, attributions based on the effect of a marginal ablation will take you d2 forward passes, where d is the number of features in a layer. Integrated gradients will take O(d) backward passes, and if you’re willing to write custom code that exploits the specific form of the layer transition, it can take less than that.
If you’re averaging over a data set, IG is also amendable to additional cost reduction through stochastic source techniques.
Maybe I’m confused, but isn’t integrated gradients strictly slower than an ablation to a baseline?
If you want to get attributions between all pairs of basis elements/features in two layers, attributions based on the effect of a marginal ablation will take you d2 forward passes, where d is the number of features in a layer. Integrated gradients will take O(d) backward passes, and if you’re willing to write custom code that exploits the specific form of the layer transition, it can take less than that.
If you’re averaging over a data set, IG is also amendable to additional cost reduction through stochastic source techniques.
For a single interaction yes (1 forward pass vs integral with n_alpha integration steps, each requiring a backward pass).
For many interactions (e.g. all connections between two layers) IGs can be faster:
Ablation requires d_embed^2 forward passes (if you want to get the effect of every patch on the loss)
Integrated gradients requires d_embed * n_alpha forward & backward passes
(This is assuming you do path patching rather than “edge patching”, which you should in this scenario.)
Sam Marks makes a similar point in Sparse Feature Circuits, near equations (2), (3), and (4).