Benchmarks for o1 were included in the o1/o1-preview announcement, and you could eyeball the jumps as roughly equal for 4o → o1-preview → o1. (Another way to put it: the o1-preview you have access to has only half the total gain.) So if you only match o1-preview at its announcement, you are far behind o1 back then, and further behind now.
Why is that comparison not to the much better GPT-4 o1 then, or the doubtless better o1 now?
I’m not sure the o1 model has even been benchmarked yet, let alone used, so that’s why they are focused on o1 preview.
Benchmarks for o1 were included in the o1/o1-preview announcement, and you could eyeball the jumps as roughly equal for 4o → o1-preview → o1. (Another way to put it: the o1-preview you have access to has only half the total gain.) So if you only match o1-preview at its announcement, you are far behind o1 back then, and further behind now.