I plan to add an “adversarial robustness” criterion in the next major update to my https://ailabwatch.org scorecard and defer to scale’s thing (how exactly to turn their numbers into grades TBD), unless someone convinces me that scale’s thing is bad or something else is even better?
New adversarial robustness scorecard: https://scale.com/leaderboard/adversarial_robustness. Yay DeepMind for being in the lead.
I plan to add an “adversarial robustness” criterion in the next major update to my https://ailabwatch.org scorecard and defer to scale’s thing (how exactly to turn their numbers into grades TBD), unless someone convinces me that scale’s thing is bad or something else is even better?