This is cool! You may also be interested in Universal Triggers These are also short nonsense phrases that wreck havoc on a model.
Eric Wallace
Karma: 62
This is cool! You may also be interested in Universal Triggers These are also short nonsense phrases that wreck havoc on a model.
You also may want to checkout Universal Adversarial Triggers, which is an academic paper from 2019 that does the same thing as the above, where they craft the optimal worst-case prompt to feed into a model. And then they use the prompt for analyzing GPT-2 and other models.