Knight Lee comments on Knight Lee’s Shortform

Knight Lee 22 Dec 2024 2:35 UTC
1 point
0
It’s important to remember that o3′s score on the ARC-AGI is “tuned” while previous AI’s scores are not “tuned.” Being explicitly trained on example test questions gives it a major advantage.
According to François Chollet (ARC-AGI designer):
Note on “tuned”: OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data.
It’s interesting that OpenAI did not test how well o3 would have done before it was “tuned.”
EDIT: People at OpenAI deny “fine-tuning” o3 for the ARC (see this comment by Zach Stein-Perlman). But to me, the denials sound like “we didn’t use a separate derivative of o3 (that’s fine-tuned for just the test) to take the test, but we may have still done reinforcement learning on the public training set.” (See my reply)