Thanks. I agree you’re pointing at something flawed in the current version and generally thorny. Strong-upvoted and strong-agreevoted.
Generally, the deployment criteria should be gated behind “has a plan to do this when models are actually powerful and their implementation of the plan is credible”.
I didn’t put much effort into clarifying this kind of thing because it’s currently moot—I don’t think it would change any lab’s score—but I agree.[1] I think e.g. a criterion “use KYC” should technically be replaced with “use KYC OR say/demonstrate that you’re prepared to implement KYC and have some capability/risk threshold to implement it and [that threshold isn’t too high].”
Don’t pass cost benefit for current models which pose low risk. (And it seems the criteria is “do you have them implemented right now?) . . . .
(A general problem with this project is somewhat arbitrarily requiring specific countermeasures. I think this is probably intrinsic to the approach I’m afraid.)
Yeah. The criteria can be like “implement them or demonstrate that you could implement them and have a good plan to do so,” but it would sometimes be reasonable for the lab to not have done this yet. (Especially for non-frontier labs; the deployment criteria mostly don’t work well for evaluating non-frontier labs. Also if demonstrating that you could implement something is difficult, even if you could implement it.)
I get the sense that this criteria doesn’t quite handle the necessarily edge cases to handle reasonable choices orgs might make.
Hmm, yeah it does seem thorny if you can get the points by just saying you’ll do something.
Like I absolutely think this shouldn’t count for security. I think you should have to demonstrate actual security of model weights and I can’t think of any demonstration of “we have the capacity to do security” which I would find fully convincing. (Though setting up some inference server at some point which is secure to highly resourced pen testers would be reasonably compelling for demonstrating part of the security portfolio.)
Thanks. I agree you’re pointing at something flawed in the current version and generally thorny. Strong-upvoted and strong-agreevoted.
I didn’t put much effort into clarifying this kind of thing because it’s currently moot—I don’t think it would change any lab’s score—but I agree.[1] I think e.g. a criterion “use KYC” should technically be replaced with “use KYC OR say/demonstrate that you’re prepared to implement KYC and have some capability/risk threshold to implement it and [that threshold isn’t too high].”
Yeah. The criteria can be like “implement them or demonstrate that you could implement them and have a good plan to do so,” but it would sometimes be reasonable for the lab to not have done this yet. (Especially for non-frontier labs; the deployment criteria mostly don’t work well for evaluating non-frontier labs. Also if demonstrating that you could implement something is difficult, even if you could implement it.)
I’m interested in suggestions :shrug:
And I think my site says some things that contradict this principle, like ‘these criteria require keeping weights private.’ Oops.
Hmm, yeah it does seem thorny if you can get the points by just saying you’ll do something.
Like I absolutely think this shouldn’t count for security. I think you should have to demonstrate actual security of model weights and I can’t think of any demonstration of “we have the capacity to do security” which I would find fully convincing. (Though setting up some inference server at some point which is secure to highly resourced pen testers would be reasonably compelling for demonstrating part of the security portfolio.)