brambleboy comments on Andrew Burns’s Shortform

brambleboy 16 Jun 2024 3:01 UTC
2 points
0
For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It’s also trained with a 4k context length as opposed to Llama-3′s 8k. Its primary purpose seems to be the synthetic data pipeline thing.