Maybe noise makes training worse because the model can’t learn to just ignore it due to insufficient data? (E.g., making training more noisy means convergence/compute efficiency is lower.)
Also, does this decrease the size of the dataset by a factor of 5 in the uniform noise case? (Or did they normalize this by using a fixed set of labeled data and then just added additional noise labels?)
This is in the data constrained case right?
Maybe noise makes training worse because the model can’t learn to just ignore it due to insufficient data? (E.g., making training more noisy means convergence/compute efficiency is lower.)
Also, does this decrease the size of the dataset by a factor of 5 in the uniform noise case? (Or did they normalize this by using a fixed set of labeled data and then just added additional noise labels?)