No! That’s why they’re clearly adversarial, as opposed to things that the density model gets right.
Thanks. (The alternative I was thinking of is that the prompt might look okay but cause the model to output a continuation that’s surprising and undesirable.)
No! That’s why they’re clearly adversarial, as opposed to things that the density model gets right.
Thanks. (The alternative I was thinking of is that the prompt might look okay but cause the model to output a continuation that’s surprising and undesirable.)