Sorry for the delay—thanks for this! Yeah I agree, in general the OV circuit seems like it’ll be much easier given the fact that it doesn’t have the bilinearity or the softmax issue. I think the idea you sketch here sounds like a really promising one and pretty in line with some of the things we’re trying atm
I think the tough part will be the next step which is somehow “stitching together” the QK and OV decompositions that give you an end-to-end understanding of what the whole attention layer is doing. Although I think the extent to which we should be thinking about the QK and OV circuit as totally independent is still unclear to me
Interested to hear more about your work though! Being able to replace the entire model sounds impressive given how much reconstruction errors seem to compound
Sorry for the delay—thanks for this! Yeah I agree, in general the OV circuit seems like it’ll be much easier given the fact that it doesn’t have the bilinearity or the softmax issue. I think the idea you sketch here sounds like a really promising one and pretty in line with some of the things we’re trying atm
I think the tough part will be the next step which is somehow “stitching together” the QK and OV decompositions that give you an end-to-end understanding of what the whole attention layer is doing. Although I think the extent to which we should be thinking about the QK and OV circuit as totally independent is still unclear to me
Interested to hear more about your work though! Being able to replace the entire model sounds impressive given how much reconstruction errors seem to compound