Agreed. I do value methods being architecture independent, but mostly just because of this:
and maybe a sign that a method is principled
At scale, different architectures trained on the same data seem to converge to learning similar algorithms to some extent. I care about decomposing and understanding these algorithms, independent of the architecture they happen to be implemented on. If a mech interp method is formulated in a mostly architecture independent manner, I take that as a weakly promising sign that it’s actually finding the structure of the learned algorithm, instead of structure related to the implmentation on one particular architecture.
Agreed. I do value methods being architecture independent, but mostly just because of this:
At scale, different architectures trained on the same data seem to converge to learning similar algorithms to some extent. I care about decomposing and understanding these algorithms, independent of the architecture they happen to be implemented on. If a mech interp method is formulated in a mostly architecture independent manner, I take that as a weakly promising sign that it’s actually finding the structure of the learned algorithm, instead of structure related to the implmentation on one particular architecture.