Maybe I’m confused, but isn’t integrated gradients strictly slower than an ablation to a baseline?
For a single interaction yes (1 forward pass vs integral with n_alpha integration steps, each requiring a backward pass).
For many interactions (e.g. all connections between two layers) IGs can be faster:
Ablation requires d_embed^2 forward passes (if you want to get the effect of every patch on the loss)
Integrated gradients requires d_embed * n_alpha forward & backward passes
(This is assuming you do path patching rather than “edge patching”, which you should in this scenario.)
Sam Marks makes a similar point in Sparse Feature Circuits, near equations (2), (3), and (4).
For a single interaction yes (1 forward pass vs integral with n_alpha integration steps, each requiring a backward pass).
For many interactions (e.g. all connections between two layers) IGs can be faster:
Ablation requires d_embed^2 forward passes (if you want to get the effect of every patch on the loss)
Integrated gradients requires d_embed * n_alpha forward & backward passes
(This is assuming you do path patching rather than “edge patching”, which you should in this scenario.)
Sam Marks makes a similar point in Sparse Feature Circuits, near equations (2), (3), and (4).