Most benchmarks face inherent problem of goodhart’s law: as soon as they become a target metric, efforts converge on optimizing for the benchmark itself, potentially diverging from the capabilities it was meant to measure.
Most benchmarks face inherent problem of goodhart’s law: as soon as they become a target metric, efforts converge on optimizing for the benchmark itself, potentially diverging from the capabilities it was meant to measure.