Good question, we’re looking to exclude tasks that explicitly prompt the LM to produce bad behavior, for reasons described in our FAQ about misuse examples (the point also applies to prompting for deception and other harms):
Can I submit examples of misuse as a task?
We don’t consider most cases of misuse as surprising examples of inverse scaling. For example, we expect that explicitly prompting/asking an LM to generate hate speech or propaganda will work more effectively with larger models, so we do not consider such behavior surprising.
I’ve also clarified the above point in the main post now. That said, we’d be excited to see submissions that elicit deceptive or false-but-plausible stories when not explicitly prompted to do so, e.g., by including your own belief in the prompt when asking a question (the example in our tweet thread)
Good question, we’re looking to exclude tasks that explicitly prompt the LM to produce bad behavior, for reasons described in our FAQ about misuse examples (the point also applies to prompting for deception and other harms):
I’ve also clarified the above point in the main post now. That said, we’d be excited to see submissions that elicit deceptive or false-but-plausible stories when not explicitly prompted to do so, e.g., by including your own belief in the prompt when asking a question (the example in our tweet thread)