You’re misunderstanding the nature of the semi-private test set (which you referred to as test one) and the private test set (which you referred to as test two).
The reason that o3 can’t do the private test set is because only models that provide their source code to the test creator and run the test on the arc-agi server with no internet access can take that test. The purpose of this is to prevent contamination of the test set, because as soon as a proprietary model with internet access takes the test, it’s pretty much guaranteed that the questions are now viewable by the owner of the model. The only way to prevent that is for the owner of the model to provide the source code and run the test offline.
So a new OpenAI model could never do that test, because they are too greedy to make them open source. The reward for a score of 85% or higher in the private test set is $600,000 USD, a reward that naturally has yet to be claimed, and I expect will not be claimed for some time.
However, I agree that o3′s score on the semi private test set is not impressive. All of these questions are actually technically viewable by OpenAI because they have run their other models on it, so their models have been asked these 100 questions before. OpenAI is a for profit (aspiring) company, I do not put it past them to train o3 on the direct questions from this test set, considering how much money they have to gain when they go public, and how much money they need from investors as long as they remain a not for profit. This whole thing has been massively over hyped and I wouldn’t be surprised if the creator of the test received a kick back, considering how much he has been publicly glazing them.
It’s very frustrating to see them fool so many people by trying to use this result to claim that they are on the brink of AGI.
You’re misunderstanding the nature of the semi-private test set (which you referred to as test one) and the private test set (which you referred to as test two).
The reason that o3 can’t do the private test set is because only models that provide their source code to the test creator and run the test on the arc-agi server with no internet access can take that test. The purpose of this is to prevent contamination of the test set, because as soon as a proprietary model with internet access takes the test, it’s pretty much guaranteed that the questions are now viewable by the owner of the model. The only way to prevent that is for the owner of the model to provide the source code and run the test offline.
So a new OpenAI model could never do that test, because they are too greedy to make them open source. The reward for a score of 85% or higher in the private test set is $600,000 USD, a reward that naturally has yet to be claimed, and I expect will not be claimed for some time.
However, I agree that o3′s score on the semi private test set is not impressive. All of these questions are actually technically viewable by OpenAI because they have run their other models on it, so their models have been asked these 100 questions before. OpenAI is a for profit (aspiring) company, I do not put it past them to train o3 on the direct questions from this test set, considering how much money they have to gain when they go public, and how much money they need from investors as long as they remain a not for profit. This whole thing has been massively over hyped and I wouldn’t be surprised if the creator of the test received a kick back, considering how much he has been publicly glazing them.
It’s very frustrating to see them fool so many people by trying to use this result to claim that they are on the brink of AGI.