ARC public test set is on GitHub and almost certainly in GPT-4o’s training data.
Your model has trained on the benchmark it’s claiming to beat.
This doesn’t appear to matter based on the new semi-private evaluation set.
See here for context.
ARC public test set is on GitHub and almost certainly in GPT-4o’s training data.
Your model has trained on the benchmark it’s claiming to beat.
This doesn’t appear to matter based on the new semi-private evaluation set.
See here for context.