Great post! I don’t think Chris Olah’s work is a good example of non-transferable principles though. His team was able to make a lot of progress on transformer interpretability in a relatively short time, and I expect that there was a lot of transfer of skills and principles from the work on image nets that made this possible. For example, the idea of circuits and the “universality of circuits” principle seems to have transferred to transformers pretty well.
Great post! I don’t think Chris Olah’s work is a good example of non-transferable principles though. His team was able to make a lot of progress on transformer interpretability in a relatively short time, and I expect that there was a lot of transfer of skills and principles from the work on image nets that made this possible. For example, the idea of circuits and the “universality of circuits” principle seems to have transferred to transformers pretty well.