Regarding the “assesment platform with ScaleAI”: Does anybody here plan to take some academia-related actions based on this? Or do you know somebody considering this? (I am considering this, but see below.)
Context: This seems like an opportunity to legitimise red-teaming as an academically respectable thing. Also, an opportunity to have a more serious (incl. academic) discussion on mechanism/incentive design regarding red-teaming (cf. Legitimising AI Red-Teaming by Pulic). I am considering taking some actions in this respect—for example, getting together a white-paper or call for proposals. However, it seems that other people might be interested in this as well, and it seems important to coordinate (etc etc).
But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).
I imagine this would primarily be a report from the competition? What I was thinking about was more about how this sort of assessment should be done in general, what are the similarities and differences between cybersecurity, and how to squeeze more utility out of it.
For example, a (naive version of) one low-hanging fruit is to withhold 10% of the obtained data (from the AI companies, then test those jailbreak strategies later). This would give us some insight into whether the current “alignment” methods generalise, or whether we are closer to playing whack-a-mole. Similarly to how we use test data in ML.
There are many more considerations, and many more things you can do. And I don’t claim to have all the answers, nor to be the optimal person to be writing about them. Just that it would be good if somebody was doing that (and wondering whether that is happening :-) ).
Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.
Theoretical CS/AI/game theory, rather than cybersecurity. Given the lack of cybersec background, I acknowledge I might be very far off.
To me, it seems that the perception in cybersecurity might be different from the perception outside of it. Also, red teaming in the context of AI models might have important differences from cybersecurity context. Also, red teaming by public seems, to me, different from internal red-teaming or bounties. (Though this might be one of the things where I am far off.)
Regarding the “assesment platform with ScaleAI”: Does anybody here plan to take some academia-related actions based on this? Or do you know somebody considering this? (I am considering this, but see below.)
Context: This seems like an opportunity to legitimise red-teaming as an academically respectable thing. Also, an opportunity to have a more serious (incl. academic) discussion on mechanism/incentive design regarding red-teaming (cf. Legitimising AI Red-Teaming by Pulic). I am considering taking some actions in this respect—for example, getting together a white-paper or call for proposals. However, it seems that other people might be interested in this as well, and it seems important to coordinate (etc etc).
Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.
But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).
I imagine this would primarily be a report from the competition? What I was thinking about was more about how this sort of assessment should be done in general, what are the similarities and differences between cybersecurity, and how to squeeze more utility out of it. For example, a (naive version of) one low-hanging fruit is to withhold 10% of the obtained data (from the AI companies, then test those jailbreak strategies later). This would give us some insight into whether the current “alignment” methods generalise, or whether we are closer to playing whack-a-mole. Similarly to how we use test data in ML.
There are many more considerations, and many more things you can do. And I don’t claim to have all the answers, nor to be the optimal person to be writing about them. Just that it would be good if somebody was doing that (and wondering whether that is happening :-) ).
Theoretical CS/AI/game theory, rather than cybersecurity. Given the lack of cybersec background, I acknowledge I might be very far off.
To me, it seems that the perception in cybersecurity might be different from the perception outside of it. Also, red teaming in the context of AI models might have important differences from cybersecurity context. Also, red teaming by public seems, to me, different from internal red-teaming or bounties. (Though this might be one of the things where I am far off.)