I doubt it. Evaluating gradients along an entire trajectory from a baseline gave qualitatively similar results.
A saturated softmax also really does induce insensitivity to small changes. If two nodes are always connected by a saturated softmax, they can’t be exchanging more than one bit of information. Though the importance of that bit can be large.
My best guess for why the Interaction Basis didn’t work is that sparse, overcomplete representations really are a thing. So in general, you’re not going to get a good decomposition of LMs from a Cartesian basis of activation space.
I doubt it. Evaluating gradients along an entire trajectory from a baseline gave qualitatively similar results.
A saturated softmax also really does induce insensitivity to small changes. If two nodes are always connected by a saturated softmax, they can’t be exchanging more than one bit of information. Though the importance of that bit can be large.
My best guess for why the Interaction Basis didn’t work is that sparse, overcomplete representations really are a thing. So in general, you’re not going to get a good decomposition of LMs from a Cartesian basis of activation space.