Igor Ivanov comments on OpenAI’s CBRN tests seem unclear

Igor Ivanov 22 Nov 2024 10:02 UTC
3 points
0
I didn’t have in mind o1, these exact results seem consistent. Here’s an example I had in mind:

Claude 3.5 Sonnet (old) scores 48% on ProtocolQA, and 7.1% on BioLP-bench
GPT-4o scores 53% on ProtocolQA and 17% on BioLP-bench