I am not sure how new this approach is (for simplified Transformers, the original AMFOTC paper has several sections called “* Path Expansion *”, which seem to do something very similar for a reduced set of transformations, and their formalism of “virtual attention heads” seems also to be in that spirit).
Fair point, and I should amend the post to point out that AMFOTC also does ‘path expansion’. However, I think this is still conceptually distinct from AMFOTC because:
In my reading of AMFOTC, the focus seems to be on understanding attention by separating the QK and OV circuits, writing these as linear (or almost linear) terms, and fleshing this out for 1-2 layer attention-only transformers. This is cool, but also very hard to use at the level of a full model
Beyond understanding individual attention heads, I am more interested in how the whole model works; IMO this is very unlikely to be simply understood as a sum of linear components. OTOH residual expansion gives a sum of nonlinear components and maybe each of those things is more interpretable.
I think the notion of path ‘degrees’ hasn’t been explicitly stated before and I found this to be a useful abstraction to think about circuit complexity.
maybe this post is better framed as ‘reconciling AMFOTC with SAE circuit analysis’.
Here is one aspect which might be useful to keep in mind.
If we think about all this as some kind of “generalized Taylor expansion”, there are some indications that the deviations from linearity might be small.
Another indication pointing to “almost linearity” is that “model merge” works pretty well. Although, interestingly enough, people often prefer to approach “model merge” in a more subtle fashion than just linear interpolation, so, presumably, non-linearity does matter quite a bit as well, e.g. https://huggingface.co/blog/mlabonne/merge-models.
I think this makes sense.
I am not sure how new this approach is (for simplified Transformers, the original AMFOTC paper has several sections called “* Path Expansion *”, which seem to do something very similar for a reduced set of transformations, and their formalism of “virtual attention heads” seems also to be in that spirit).
Fair point, and I should amend the post to point out that AMFOTC also does ‘path expansion’. However, I think this is still conceptually distinct from AMFOTC because:
In my reading of AMFOTC, the focus seems to be on understanding attention by separating the QK and OV circuits, writing these as linear (or almost linear) terms, and fleshing this out for 1-2 layer attention-only transformers. This is cool, but also very hard to use at the level of a full model
Beyond understanding individual attention heads, I am more interested in how the whole model works; IMO this is very unlikely to be simply understood as a sum of linear components. OTOH residual expansion gives a sum of nonlinear components and maybe each of those things is more interpretable.
I think the notion of path ‘degrees’ hasn’t been explicitly stated before and I found this to be a useful abstraction to think about circuit complexity.
maybe this post is better framed as ‘reconciling AMFOTC with SAE circuit analysis’.
Yes, I think this makes sense.
Here is one aspect which might be useful to keep in mind.
If we think about all this as some kind of “generalized Taylor expansion”, there are some indications that the deviations from linearity might be small.
E.g. there is this rather famous post, https://www.lesswrong.com/posts/JK9nxcBhQfzEgjjqe/deep-learning-models-might-be-secretly-almost-linear.
Another indication pointing to “almost linearity” is that “model merge” works pretty well. Although, interestingly enough, people often prefer to approach “model merge” in a more subtle fashion than just linear interpolation, so, presumably, non-linearity does matter quite a bit as well, e.g. https://huggingface.co/blog/mlabonne/merge-models.