Beth Barnes comments on Anthropic rewrote its RSP

Beth Barnes 16 Oct 2024 4:08 UTC
42 points
9
I’m glad you brought this up, Zac—seems like an important question to get to the bottom of!

METR is somewhat capacity constrained and we can’t currently commit to e.g. being available on a short notice to do thorough evaluations for all the top labs—which is understandably annoying for labs.

Also, we don’t want to discourage people from starting competing evaluation or auditing orgs, or otherwise “camp the space”.

We also don’t want to accidentally safety-wash -that post was written in particular to dispel the idea that “METR has official oversight relationships with all the labs and would tell us if anything really concerning was happening”

All that said, I think labs’ willingness to share access/information etc is a bigger bottleneck than METR’s capacity or expertise. This is especially true for things that involve less intensive labor from METR (e.g. reviewing a lab’s proposed RSP or evaluation protocol and giving feedback, going through a checklist of evaluation best practices, or having an embedded METR employee observing the lab’s processes—as opposed to running a full evaluation ourselves).

I think “Anthropic would love to pilot third party evaluations / oversight more but there just isn’t anyone who can do anything useful here” would be a pretty misleading characterization to take away, and I think there’s substantially more that labs including Anthropic could be doing to support third party evaluations.

If we had a formalized evaluation/auditing relationship with a lab but sometimes evaluations didn’t get run due to our capacity, I expect in most cases we and the lab would want to communicate something along the lines of “the lab is doing their part, any missing evaluations are METR’s fault and shouldn’t be counted against the lab”.