Thanks, that’s right. I’ve updated the post to communicate the above:
In particular, submissions must demonstrate new or surprising examples of inverse scaling, e.g., excluding most misuse-related behaviors where you specifically prompt the LM to generate harmful or deceptive text; we don’t consider scaling on these behaviors to be surprising in most cases, and we’re hoping to uncover more unexpected, undesirable behaviors.
Thanks, that’s right. I’ve updated the post to communicate the above: