I think that this post is a good description of a way of thinking about the usefulness of transparency and interpretability for AI alignment that I think is underrated by the LW-y AI safety community.
I think that this post is a good description of a way of thinking about the usefulness of transparency and interpretability for AI alignment that I think is underrated by the LW-y AI safety community.