Sonia Joseph comments on Influence functions—why, what and how

Sonia Joseph 27 Mar 2024 5:34 UTC
1 point
0
Thank you for this. How would you think about the pros/cons of influence functions vs activation patching or direct logit attribution in terms of localizing a behavior in the model?