Jonathan Claybrough comments on Agentized LLMs will change the alignment landscape

Jonathan Claybrough 12 Apr 2023 13:36 UTC
1 point
0
Did you know about “by default, GPTs think in plain sight”?
It doesn’t explicitly talk about agentized GPTs but was discussing the impact this has on GPTs for AGI and how it affects the risks, and what we should do about it (eg. maybe rlhf is dangerous)
- Seth Herd 12 Apr 2023 17:07 UTC
  2 points
  0
  Parent
  Thank you. I think it is relevant. I just found it yesterday following up on this. The comment there by Gwern is a really interesting example of how we could accidentally introduce pressure for them to use steganography so their thoughts aren’t in English.
  
  What I’m excited about is that agentizing them, while dangerous, could mean they not only think in plain sight, but they’re actually what gets used. That would cross from only being able to say how to get alignment, to making it so.ething the world would actually do.