I endorse this comment for the record.
I’m considering editing the blog post to clarify.
If I had known that prior work got a wildly different score on the public test set (comparable to the score I get), I wouldn’t have claimed SOTA.
(That said, as you note, it seems reasonably likely (though unclear) that this prior solution was overfit to the test set while my solution is not.)
I’m submitting to the private leaderboard (with fewer samples than used in this post). If results indicate that SOTA is unlikely, I’ll retract my claim.
I edited to add:
(Edit: But see this comment and this comment for important clarifications.)
And changed from “this dataset” to “a similarly difficult dataset”.
I endorse this comment for the record.
I’m considering editing the blog post to clarify.
If I had known that prior work got a wildly different score on the public test set (comparable to the score I get), I wouldn’t have claimed SOTA.
(That said, as you note, it seems reasonably likely (though unclear) that this prior solution was overfit to the test set while my solution is not.)
I’m submitting to the private leaderboard (with fewer samples than used in this post). If results indicate that SOTA is unlikely, I’ll retract my claim.
I edited to add:
And changed from “this dataset” to “a similarly difficult dataset”.