If you have a big pile of text that you want people training their LLMs on, I recommend compiling and publishing it as a Huggingface dataset.
If you have a big pile of text that you want people training their LLMs on, I recommend compiling and publishing it as a Huggingface dataset.