in each of the 50 different subject areas that we tested it on, it’s as good as the best expert humans in those areas
That sounds like an incredibly strong claim, but I suspect that the phrasing is very misleading. What kind of tests is Hassabis talking about here? Maybe those are tests that rely on remembering known facts much more than on making novel inferences? Surely Gemini is not (say) as good as the best mathematicians at solving open problems in mathematics?
The YouTube channel ai explained looked into this and what it means is that it scores better than human on matching the labels for “is correct answer” on MMLU multiple choice questions. Apparently that dataset is about 2% wrong answers anyway, so it’s even worse than just the fact of only being multiple choice answers.
That sounds like an incredibly strong claim, but I suspect that the phrasing is very misleading. What kind of tests is Hassabis talking about here? Maybe those are tests that rely on remembering known facts much more than on making novel inferences? Surely Gemini is not (say) as good as the best mathematicians at solving open problems in mathematics?
The YouTube channel ai explained looked into this and what it means is that it scores better than human on matching the labels for “is correct answer” on MMLU multiple choice questions. Apparently that dataset is about 2% wrong answers anyway, so it’s even worse than just the fact of only being multiple choice answers.