Lucius Bushnaq comments on The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq 21 May 2024 7:43 UTC
6 points
0
I doubt it. Evaluating gradients along an entire trajectory from a baseline gave qualitatively similar results.
A saturated softmax also really does induce insensitivity to small changes. If two nodes are always connected by a saturated softmax, they can’t be exchanging more than one bit of information. Though the importance of that bit can be large.

My best guess for why the Interaction Basis didn’t work is that sparse, overcomplete representations really are a thing. So in general, you’re not going to get a good decomposition of LMs from a Cartesian basis of activation space.