Isn’t multi-epoch training most likely to lead to overfitting, making the models less useful/powerful?
If it were possible to write an algorithm to generate this synthetic training data how would the resulting training data have any more information content than the algorithm that produced it? Sure, you’d get an enormous increase in training text volume, but large volumes of training data containing small amounts of information seems counterproductive for training purposes—it will just bias the model disproportionately toward that small amount of information.
Isn’t multi-epoch training most likely to lead to overfitting, making the models less useful/powerful?
If it were possible to write an algorithm to generate this synthetic training data how would the resulting training data have any more information content than the algorithm that produced it? Sure, you’d get an enormous increase in training text volume, but large volumes of training data containing small amounts of information seems counterproductive for training purposes—it will just bias the model disproportionately toward that small amount of information.