lc comments on Agentized LLMs will change the alignment landscape

lc 11 Apr 2023 3:51 UTC
2 points
0
Those hurdles for interpretability research exist whether or not someone is using AutoGPT to run the LLM. My question is why you think the interpretability research done so far is less useful, because people are prompting the LLM to act agentically directly instead of {some other thing}.
- Seth Herd 11 Apr 2023 18:01 UTC
  1 point
  0
  Parent
  The interpretability research done so far is still important, and we’ll still need more and better of the same, for the reason you point out. The natural language outputs aren’t a totally trustworthy indicator of the semantics underneath. But they are a big help and a new challenge for interpretability.