I think BIG-bench could be the final AI benchmark: if a language model surpasses the top human score on it, the model is an AGI. At this point, there is nowhere to move the goalposts.
But when you say:
the benchmark is still growing. The organizers keep it open for submissions.
Doesn’t that mean this benchmark is a set of moving goalposts?
Good catch! You’re right, if contributors continue to add harder and harder tasks to the benchmark, and do it fast enough, the benchmark could be forever ahead.
I expect that some day the benchmark will be frozen. And even if it’s not frozen, new tasks are added only a few times per month these days, thus it’s not impossible to solve its current version.
But when you say:
Doesn’t that mean this benchmark is a set of moving goalposts?
Good catch! You’re right, if contributors continue to add harder and harder tasks to the benchmark, and do it fast enough, the benchmark could be forever ahead.
I expect that some day the benchmark will be frozen. And even if it’s not frozen, new tasks are added only a few times per month these days, thus it’s not impossible to solve its current version.