You could literally go through some giant corpus with an LLM and see which samples have gradients similar to those from training on a spelling task.
You could literally go through some giant corpus with an LLM and see which samples have gradients similar to those from training on a spelling task.