I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?
I do not worry a lot about this. It would be a problem. But some methods are model-agnostic and would transfer fine. Some other methods have close analogs for other architectures. For example, ROME is specific to transformers, but causal tracing and rank one editing are more general principles that are not.
I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?
I do not worry a lot about this. It would be a problem. But some methods are model-agnostic and would transfer fine. Some other methods have close analogs for other architectures. For example, ROME is specific to transformers, but causal tracing and rank one editing are more general principles that are not.