If you want to get attributions between all pairs of basis elements/features in two layers, attributions based on the effect of a marginal ablation will take you d2 forward passes, where d is the number of features in a layer. Integrated gradients will take O(d) backward passes, and if you’re willing to write custom code that exploits the specific form of the layer transition, it can take less than that.
If you’re averaging over a data set, IG is also amendable to additional cost reduction through stochastic source techniques.
If you want to get attributions between all pairs of basis elements/features in two layers, attributions based on the effect of a marginal ablation will take you d2 forward passes, where d is the number of features in a layer. Integrated gradients will take O(d) backward passes, and if you’re willing to write custom code that exploits the specific form of the layer transition, it can take less than that.
If you’re averaging over a data set, IG is also amendable to additional cost reduction through stochastic source techniques.