You’re saying that transformers are key to alignment research?
I would imagine that latent space exploration and explanation is a useful part of interpretability, and developing techniques that work for both language and images improves the chance that the techniques will generalize to new neural architectures.
You’re saying that transformers are key to alignment research?
I would imagine that latent space exploration and explanation is a useful part of interpretability, and developing techniques that work for both language and images improves the chance that the techniques will generalize to new neural architectures.