I’ve seen mixed data on how important curricula are for deep learning. One paper (on CIFAR) suggested that curricula only help if you have very few datapoints or the labels are noisy. But possibly that doesn’t generalize to LLMs.
I think data ordering basically never matters for LLM pretraining. (As in, random is the best and trying to make the order more specific doesn’t help.)
I’ve seen mixed data on how important curricula are for deep learning. One paper (on CIFAR) suggested that curricula only help if you have very few datapoints or the labels are noisy. But possibly that doesn’t generalize to LLMs.
I think data ordering basically never matters for LLM pretraining. (As in, random is the best and trying to make the order more specific doesn’t help.)
That was my impression too.