@johnswentworth FWIW, GPQA Diamond seems much harder than GPQA main to me, and current models perform well on it. I suspect these models beat your performance on GPQA diamond if you’re allowed 30 mins per problem. I wouldn’t be shocked if you beat them (maybe I’m like 20%?), but that’s because you’re unusually broadly knowledgeable about science, not just because you’re smart.
I personally get wrecked by GPQA chemistry, get ~50% on GPQA biology if I have like 7 minutes per problem (which is notably better than their experts from other fields get, with much less time), and get like ~80% on GPQA physics with less than 5 minutes per problem. But GPQA Diamond seems much harder.
@johnswentworth FWIW, GPQA Diamond seems much harder than GPQA main to me, and current models perform well on it. I suspect these models beat your performance on GPQA diamond if you’re allowed 30 mins per problem. I wouldn’t be shocked if you beat them (maybe I’m like 20%?), but that’s because you’re unusually broadly knowledgeable about science, not just because you’re smart.
I personally get wrecked by GPQA chemistry, get ~50% on GPQA biology if I have like 7 minutes per problem (which is notably better than their experts from other fields get, with much less time), and get like ~80% on GPQA physics with less than 5 minutes per problem. But GPQA Diamond seems much harder.
Is this with internet access for you?
Yes, I’d be way worse off without internet access.