The paper suggests strong evidence of widespread task contamination. Performance on data sets released after the training data creation
I haven’t read the paper so maybe it addresses this but… generally speaking when people create new data sets they try to make them more difficult than the already-existing ones, since the already-existing ones are getting saturated.
I haven’t read the paper so maybe it addresses this but… generally speaking when people create new data sets they try to make them more difficult than the already-existing ones, since the already-existing ones are getting saturated.