You’re right. And some of the existing tasks in the benchmark are way beyond the abilities of baseline humans (e.g. the image classification task where images are hex-encoded texts).
On the other hand, the organizers allowed the human testers to use any tool they want, including internet search, software etc. So, the measured top-human performance is the performance of humans augmented with technology.
I think an AI that can solve BIG-bench must be an AGI. But there could be an AGI that can’t solve BIG-bench yet.
You’re right. And some of the existing tasks in the benchmark are way beyond the abilities of baseline humans (e.g. the image classification task where images are hex-encoded texts).
On the other hand, the organizers allowed the human testers to use any tool they want, including internet search, software etc. So, the measured top-human performance is the performance of humans augmented with technology.
I think an AI that can solve BIG-bench must be an AGI. But there could be an AGI that can’t solve BIG-bench yet.