The way performance of o1 falls off much faster than for o3 depending on size of ARC-AGI problems is significant evidence in favor of o3 being built on a different base model than o1, with better long context training or different handling of attention in model architecture. So probably post-trained Orion/GPT-4.5o.
The way performance of o1 falls off much faster than for o3 depending on size of ARC-AGI problems is significant evidence in favor of o3 being built on a different base model than o1, with better long context training or different handling of attention in model architecture. So probably post-trained Orion/GPT-4.5o.