hold_my_fish comments on SolidGoldMagikarp (plus, prompt generation)

hold_my_fish 9 Feb 2023 20:37 UTC
1 point
0
Interesting, thanks. That makes me curious: about the adversarial text examples that trick the density model, do they look intuitively ‘natural’ to us as humans?
- LawrenceC 9 Feb 2023 23:02 UTC
  5 points
  0
  Parent
  No! That’s why they’re clearly adversarial, as opposed to things that the density model gets right.
  - hold_my_fish 9 Feb 2023 23:55 UTC
    3 points
    0
    Parent
    Thanks. (The alternative I was thinking of is that the prompt might look okay but cause the model to output a continuation that’s surprising and undesirable.)