I’ve updated toward the views Daniel expresses here and I’m now about half way between Ajeya’s views in this post and Daniel’s (in geometric mean).
I’m curious what the biggest factors were that made you update?
Mostly faster benchmark performance than I expected (see Ajeya’s comment here) and o3 (and o1) being evidence that RL training can scalably work and RL can plausibly scale very far.
I’m curious what the biggest factors were that made you update?
Mostly faster benchmark performance than I expected (see Ajeya’s comment here) and o3 (and o1) being evidence that RL training can scalably work and RL can plausibly scale very far.