According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.
Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say “We should not be considered to have ‘audited’ GPT-4 or Claude”.
This doesn’t appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.
Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say “We should not be considered to have ‘audited’ GPT-4 or Claude”.
This doesn’t appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.