Seth Herd comments on Agentized LLMs will change the alignment landscape

Seth Herd 11 Apr 2023 18:01 UTC
1 point
0
The interpretability research done so far is still important, and we’ll still need more and better of the same, for the reason you point out. The natural language outputs aren’t a totally trustworthy indicator of the semantics underneath. But they are a big help and a new challenge for interpretability.