I think data ordering basically never matters for LLM pretraining. (As in, random is the best and trying to make the order more specific doesn’t help.)
That was my impression too.
I think data ordering basically never matters for LLM pretraining. (As in, random is the best and trying to make the order more specific doesn’t help.)
That was my impression too.