Ethan Perez comments on Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez 28 Jun 2022 0:20 UTC
4 points
1
Thanks, that’s right. I’ve updated the post to communicate the above:
In particular, submissions must demonstrate new or surprising examples of inverse scaling, e.g., excluding most misuse-related behaviors where you specifically prompt the LM to generate harmful or deceptive text; we don’t consider scaling on these behaviors to be surprising in most cases, and we’re hoping to uncover more unexpected, undesirable behaviors.