Thank you. I think it is relevant. I just found it yesterday following up on this. The comment there by Gwern is a really interesting example of how we could accidentally introduce pressure for them to use steganography so their thoughts aren’t in English.
What I’m excited about is that agentizing them, while dangerous, could mean they not only think in plain sight, but they’re actually what gets used. That would cross from only being able to say how to get alignment, to making it so.ething the world would actually do.
Thank you. I think it is relevant. I just found it yesterday following up on this. The comment there by Gwern is a really interesting example of how we could accidentally introduce pressure for them to use steganography so their thoughts aren’t in English.
What I’m excited about is that agentizing them, while dangerous, could mean they not only think in plain sight, but they’re actually what gets used. That would cross from only being able to say how to get alignment, to making it so.ething the world would actually do.