johnswentworth comments on johnswentworth’s Shortform

johnswentworth 27 Dec 2024 20:17 UTC
LW: 9 AF: 7
5
AF
I mean, there are lots of easy benchmarks on which I can solve the large majority of the problems, and a language model can also solve the large majority of the problems, and the language model can often have a somewhat lower error rate than me if it’s been optimized for that. Seems like GPQA (and GPQA diamond) are yet another example of such a benchmark.
- Buck 28 Dec 2024 17:36 UTC
  LW: 2 AF: 3
  0
  AF Parent
  What do you mean by “easy” here?