Nvidia just low-key released its own 340B parameter model. For those of you worried about the releasing of model weights becoming the norm, this will probably aggravate your fears.
For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It’s also trained with a 4k context length as opposed to Llama-3′s 8k. Its primary purpose seems to be the synthetic data pipeline thing.
Nvidia just low-key released its own 340B parameter model. For those of you worried about the releasing of model weights becoming the norm, this will probably aggravate your fears.
Here is the link: https://research.nvidia.com/publication/2024-06_nemotron-4-340b
Oh, and they also released their synthetic data generation pipeline:
https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It’s also trained with a 4k context length as opposed to Llama-3′s 8k. Its primary purpose seems to be the synthetic data pipeline thing.