The YouTube channel ai explained looked into this and what it means is that it scores better than human on matching the labels for “is correct answer” on MMLU multiple choice questions. Apparently that dataset is about 2% wrong answers anyway, so it’s even worse than just the fact of only being multiple choice answers.
The YouTube channel ai explained looked into this and what it means is that it scores better than human on matching the labels for “is correct answer” on MMLU multiple choice questions. Apparently that dataset is about 2% wrong answers anyway, so it’s even worse than just the fact of only being multiple choice answers.