I think this is a very promising method for improving the steering of LLMs. Which is great for reducing risk from model-originating harms like deception.
The flipside is that it increases misuse potential.
This is yet another possibility for the widening of the safety gap between closed-weight models with locked-down controls, and open weight models.
Diffusion Guided NLP: better steering, mostly a good thing
Link post
I think this is a very promising method for improving the steering of LLMs. Which is great for reducing risk from model-originating harms like deception.
The flipside is that it increases misuse potential.
This is yet another possibility for the widening of the safety gap between closed-weight models with locked-down controls, and open weight models.