Anthropic said that collaborating with METR “requir[ed] significant science and engineering support on our end”; it has not clarified why.
I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.
This is for the labs self-enforcing via public agreements.
This seems like something they would actually agree to.
If it were a law then you would replace METR with “a govt approved auditor”.
I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.
Feedback welcome.
External red-teaming is not external model evaluation. External red-teaming … several people …. ~10 hours each. External model evals … experts … evals suites … ~10,000 hours developing.
Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.
They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This sounds like an unusual source of difficulty. Some Anthropic statements have suggested that sharing is hard in general. I hope it is just stuff like this. [Edit: possibly those statements were especially referring to deep model access or high-touch support. But then they don’t explain the labs’ lack of more basic sharing.]
Some Anthropic statements have suggested that sharing is hard in general.
If they said that then they are speaking nonsense IMO. Once you have your stuff set up it’s a button you click. You have to trust that the evaluator won’t leak info or soil your reputation without good cause though.
Possibly he didn’t just mean technically difficult. And possibly Politico took this out of context. But I agree this quote seems bad and clarification would be nice.
I can comment on this (I think without breaking NDA). I will oversimplify. They were changing around their deployment system, infra, etc. We wanted uptime and throughput. Big pain in the ass to keep the model up (with proper access control) while they were overhauling stuff. Furthermore, anthropic and METR kept changing points of contact (rapidly growing teams).
This was and is my proposal for evaluator model access: If at least 10 people at a lab can access a model then at least 1 person at METR must have access.
This is for the labs self-enforcing via public agreements.
This seems like something they would actually agree to.
If it were a law then you would replace METR with “a govt approved auditor”.
I think conformance could be greatly improved by getting labs to use a little login widget (could be CLI) which allows eg METR to see access permission changes (possibly with codenames for models andor people). Ideally this would be very little effort for labs and sidestepping it would be more effort once it was set up.
Feedback welcome.
Yes there is some awkwardness here… Red teaming could be extremely effective if structured as an open competition. Possibly more effective than orgs like METR. The problem is that this trains up tons of devs on Doing Evil With AI and probably also produces lots of really useful github repos. So I agree with you.
Thanks.
This sounds like an unusual source of difficulty. Some Anthropic statements have suggested that sharing is hard in general. I hope it is just stuff like this. [Edit: possibly those statements were especially referring to deep model access or high-touch support. But then they don’t explain the labs’ lack of more basic sharing.]
If they said that then they are speaking nonsense IMO. Once you have your stuff set up it’s a button you click. You have to trust that the evaluator won’t leak info or soil your reputation without good cause though.
Jack Clark: “Pre-deployment testing is a nice idea but very difficult to implement,” from https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/
Possibly he didn’t just mean technically difficult. And possibly Politico took this out of context. But I agree this quote seems bad and clarification would be nice.