This press release (https://openai.com/index/openai-o1-system-card/) seems to equivocate between the o1 model and the weaker o1-preview and o1-mini models that were released yesterday. It would be nice if they were clearer in the press releases that the reported results are for the weaker models, not for the more powerful o1 model. It might also make sense to retitle this post to refer to o1-preview and o1-mini.
Just to make things even more confusing, the main blog post is sometimes comparing o1 and o1-preview, with no mention of o1-mini:
And then in addition to that, some testing is done on ‘pre-mitigation’ versions and some on ‘post-mitigation’, and in the important red-teaming tests, it’s not at all clear what tests were run on which (‘red teamers had access to various snapshots of the model at different stages of training and mitigation maturity’). And confusingly, for jailbreak tests, ‘human testers primarily generated jailbreaks against earlier versions of o1-preview and o1-mini, in line with OpenAI’s policies. These jailbreaks were then re-run against o1-preview and GPT-4o’. It’s not at all clear to me how the latest versions of o1-preview and o1-mini would do on jailbreaks that were created for them rather than for earlier versions. At worst, OpenAI added mitigations against those specific jailbreaks and then retested, and it’s those results that we’re seeing.
This press release (https://openai.com/index/openai-o1-system-card/) seems to equivocate between the o1 model and the weaker o1-preview and o1-mini models that were released yesterday. It would be nice if they were clearer in the press releases that the reported results are for the weaker models, not for the more powerful o1 model. It might also make sense to retitle this post to refer to o1-preview and o1-mini.
Just to make things even more confusing, the main blog post is sometimes comparing o1 and o1-preview, with no mention of o1-mini:
And then in addition to that, some testing is done on ‘pre-mitigation’ versions and some on ‘post-mitigation’, and in the important red-teaming tests, it’s not at all clear what tests were run on which (‘red teamers had access to various snapshots of the model at different stages of training and mitigation maturity’). And confusingly, for jailbreak tests, ‘human testers primarily generated jailbreaks against earlier versions of o1-preview and o1-mini, in line with OpenAI’s policies. These jailbreaks were then re-run against o1-preview and GPT-4o’. It’s not at all clear to me how the latest versions of o1-preview and o1-mini would do on jailbreaks that were created for them rather than for earlier versions. At worst, OpenAI added mitigations against those specific jailbreaks and then retested, and it’s those results that we’re seeing.