Thanks for this—I wasn’t aware. This also makes me more disappointed with the voluntary commitments at the recent AI Safety Summit. As far as I can tell (based on your blog post), the language used around external evaluations was also quite soft: ”They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments[3], and other bodies their governments deem appropriate.”
I wonder if it’d be possible to have much stronger language around this at next year’s Summit.
The EU AI Act, is the only regulatory rules that isn’t voluntary (i.e., companies have to follow them) and even that doesn’t require external evaluations (they use the wording Conformity Assessment) for all high-risk systems.
Any takes on what our best bet is for making high-quality external evals mandatory for frontier models at this stage?
Thanks for this—I wasn’t aware. This also makes me more disappointed with the voluntary commitments at the recent AI Safety Summit. As far as I can tell (based on your blog post), the language used around external evaluations was also quite soft:
”They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments[3], and other bodies their governments deem appropriate.”
I wonder if it’d be possible to have much stronger language around this at next year’s Summit.
The EU AI Act, is the only regulatory rules that isn’t voluntary (i.e., companies have to follow them) and even that doesn’t require external evaluations (they use the wording Conformity Assessment) for all high-risk systems.
Any takes on what our best bet is for making high-quality external evals mandatory for frontier models at this stage?