scasper comments on EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasper 6 Mar 2023 17:12 UTC
LW: 1 AF: 1
0
AF
I do not worry a lot about this. It would be a problem. But some methods are model-agnostic and would transfer fine. Some other methods have close analogs for other architectures. For example, ROME is specific to transformers, but causal tracing and rank one editing are more general principles that are not.