Mostly faster benchmark performance than I expected (see Ajeya’s comment here) and o3 (and o1) being evidence that RL training can scalably work and RL can plausibly scale very far.
Mostly faster benchmark performance than I expected (see Ajeya’s comment here) and o3 (and o1) being evidence that RL training can scalably work and RL can plausibly scale very far.