Lucius Bushnaq comments on Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq 21 May 2024 9:16 UTC
2 points
0
If you want to get attributions between all pairs of basis elements/features in two layers, attributions based on the effect of a marginal ablation will take you $d^{2}$ forward passes, where $d$ is the number of features in a layer. Integrated gradients will take $O (d)$ backward passes, and if you’re willing to write custom code that exploits the specific form of the layer transition, it can take less than that.
If you’re averaging over a data set, IG is also amendable to additional cost reduction through stochastic source techniques.