I do think there is a lot of value for being able to have a precise and objective measure of capabilities that are potentially of concern. I also agree that many such evals are unsafe to publish publicly, and should be kept private.
I do think there is a lot of value for being able to have a precise and objective measure of capabilities that are potentially of concern. I also agree that many such evals are unsafe to publish publicly, and should be kept private.