The interpretability research done so far is still important, and we’ll still need more and better of the same, for the reason you point out. The natural language outputs aren’t a totally trustworthy indicator of the semantics underneath. But they are a big help and a new challenge for interpretability.
The interpretability research done so far is still important, and we’ll still need more and better of the same, for the reason you point out. The natural language outputs aren’t a totally trustworthy indicator of the semantics underneath. But they are a big help and a new challenge for interpretability.