Eric Wallace comments on SolidGoldMagikarp (plus, prompt generation)

Eric Wallace 6 Feb 2023 16:25 UTC
58 points
7
You also may want to checkout Universal Adversarial Triggers https://arxiv.org/abs/1908.07125, which is an academic paper from 2019 that does the same thing as the above, where they craft the optimal worst-case prompt to feed into a model. And then they use the prompt for analyzing GPT-2 and other models.
- DanielFilan 10 Feb 2023 20:15 UTC
  5 points
  0
  Parent
  I just skimmed that paper, but I think it doesn’t find these tokens like ” SolidGoldMagikarp” that have the strange sort of behaviour described in this post. Am I missing something, or by “the exact same thing as the above” were you just referring to one particular section of the post?
- Jessica Rumbelow 6 Feb 2023 21:09 UTC
  1 point
  0
  Parent
  Thanks—wasn’t aware of this!