Seth Herd comments on Agentized LLMs will change the alignment landscape

Seth Herd 11 Apr 2023 3:10 UTC
1 point
0
Again, we did see agentizing coming. I did and I’m sure tons of other people did too. No disagreement there. In addition to the alignment upsides, what we (me and everyone I’ve read) didn’t see is the other cognitive enhancements easily available with an outer-loop script. I have little doubt that someone else saw more than I did, but it doesn’t seem to have made it into the collective dialogue. Perhaps that was for info hazard reasons, and I congratulate anyone who saw and held their tongue. I will clarify my argument for important cognitive additions beyond agentizing in a post I’m working on now. AutoGPT has one, but there are others that will come quickly now.

I did read the simulators post. I agree that interpretability research is still important, but looks to be very different than most of what’s been done to date if this new approach to AGI takes off.
- lc 11 Apr 2023 3:11 UTC
  2 points
  0
  Parent
  
  I agree that interpretability research is still important, but looks to be very different than most of what’s been done to date if this new approach to AGI takes off.
  
  Why?
  - Seth Herd 11 Apr 2023 3:25 UTC
    1 point
    0
    Parent
    Because a lot of interpretability will be about parsing gigantic internal trains of thought in natural language. This will probably demand sophisticated AI tools to aid it. Some of it will still be about decoding the representations in the LLM giving rise to that natural language. And there will be a lot of theory and experiment about what will cause the internal representation to deviate in meaning from the linguistic output. See this insightful comment by Gwern on pressures for LLMs to use steganography to make codes in their output. I suspect there are other pressures toward convoluted encoding and outright deception that I haven’t thought of. I guess that’s not properly considered interpretability, but it will be closely entwined with interpretability work to test those theories.
    - lc 11 Apr 2023 3:51 UTC
      2 points
      0
      Parent
      Those hurdles for interpretability research exist whether or not someone is using AutoGPT to run the LLM. My question is why you think the interpretability research done so far is less useful, because people are prompting the LLM to act agentically directly instead of {some other thing}.
      - Seth Herd 11 Apr 2023 18:01 UTC
        1 point
        0
        Parent
        The interpretability research done so far is still important, and we’ll still need more and better of the same, for the reason you point out. The natural language outputs aren’t a totally trustworthy indicator of the semantics underneath. But they are a big help and a new challenge for interpretability.