gwern comments on Three Subtle Examples of Data Leakage

gwern 3 Oct 2024 15:17 UTC
14 points
3
But of course you were engaged in meta-overfitting by the constant attack on the test dataset… How did you wind up detecting the leakage? Bad results when deployed to the real world?
- SarahNibs 3 Oct 2024 18:58 UTC
  7 points
  0
  Parent
  Not to toot my own horn* but we detected it when I was given the project of turning some of our visualizations into something that could accept QA’s format so they could look at their results using those visualizations and then I was like ”… so how does QA work here, exactly? Like what’s the process?”
  I do not know the real-world impact of fixing the overfitting.
  *tooting one’s own horn always follows this phrase