METR has not intended to claim to have audited anything, or to claim to be providing meaningful oversight or accountability, but there has been some confusion about whether METR is an auditor or planning to be one.

To clarify this point:

METR’s top priority is to develop the science of evaluations, and we don’t need to be auditors in order to succeed at this.
- We aim to build evaluation protocols that can be used by evaluators/auditors regardless of whether that is the government, an internal lab team, another third party, or a team at METR.
We should not be considered to have ‘audited’ GPT-4 or Claude.
- Those were informal pilots of what an audit might involve, or research collaborations – not providing meaningful oversight. For example, it was all under NDA – we didn’t have any right or responsibility to disclose our findings to anyone outside the labs – and there wasn’t any formal expectation it would inform deployment decisions. We also didn’t have the access necessary to perform a proper evaluation. In the OpenAI case, as is noted in their system card:
  
  “We granted the Alignment Research Center (ARC) early access to the models as a part of our expert red teaming efforts … We provided them with early access to multiple versions of the GPT-4 model, but they did not have the ability to fine-tune it. They also did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities, such as longer context length, and improved problem-solving abilities as in some cases we’ve observed. … fine-tuning for task-specific behavior could lead to a difference in performance. As a next step, ARC will need to conduct experiments that (a) involve the final version of the deployed model (b) involve ARC doing its own fine-tuning, before a reliable judgment of the risky emergent capabilities of GPT-4-launch can be made”.
We are and have been in conversation with frontier AI companies about whether they would like to work with us in a third-party evaluator capacity, with various options for how this could work.
- As it says on our website:
  
  “We have previously worked with Anthropic, OpenAI, and other companies to pilot some informal pre-deployment evaluation procedures. These companies have also given us some kinds of non-public access and provided compute credits to support evaluation research.
  We think it’s important for there to be third-party evaluators with formal arrangements and access commitments—both for evaluating new frontier models before they are scaled up or deployed, and for conducting research to improve evaluations.
  We do not yet have such arrangements, but we are excited about taking more steps in this direction.”
We are interested in conducting third-party evaluations and may hire & fundraise to do so, but would also be happy to see other actors enter the space. Whether we expand our capacity here depends on many factors such as:
- Whether governments mandate access/this kind of relationship.
- Whether governments want to work with third parties vs conduct audits in-house.
- Whether frontier AI companies are keen to work with us in this capacity, giving us the necessary access to do so.
- How successful we are in hiring the talent we need to do this without detracting from our top priority of developing the science.
- How successful governments or other third-party evaluators are at performing evaluation protocols sufficiently well.
- Technical considerations of what kind of expertise is required for doing good elicitation.
- Etc.

If you’re interested in helping METR conduct third-party evaluations in-house and/or support government or other auditors to perform evaluation protocols we design, then please express interest in working with us, or apply to our current openings.

Clarifying METR’s Auditing Role

Beth Barnes30 May 2024 18:41 UTC

LW: 108 AF: 61

1 comment2 min readLW link

AI AI Evaluations Hiring

What links here?

Review Bot 14 Jun 2024 17:48 UTC
1 point
0
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?