Yulu Pi comments on EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

Yulu Pi 6 Mar 2023 14:13 UTC
LW: 3 AF: 2
0
AF
I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?
- scasper 6 Mar 2023 17:12 UTC
  LW: 1 AF: 1
  0
  AF Parent
  I do not worry a lot about this. It would be a problem. But some methods are model-agnostic and would transfer fine. Some other methods have close analogs for other architectures. For example, ROME is specific to transformers, but causal tracing and rank one editing are more general principles that are not.