Solving competition problems could well be like a chess-playing AI playing chess well. Does it generalize far enough, can the method be applied to train the AI on a wide variety of tasks that are not like competition problems (distinct in it being possible to write verifiers for attempted solutions)? We know that this is not the case with AlphaZero. Is it the case with o3-like methods? Hard to tell. I don’t see how it could be known either way yet.
Solving competition problems could well be like a chess-playing AI playing chess well. Does it generalize far enough, can the method be applied to train the AI on a wide variety of tasks that are not like competition problems (distinct in it being possible to write verifiers for attempted solutions)? We know that this is not the case with AlphaZero. Is it the case with o3-like methods? Hard to tell. I don’t see how it could be known either way yet.