llama had to be so big to be SOTA,
How many parameters do you estimate for other SOTA models?
Minstral had like 150b parameters or something.
How many parameters do you estimate for other SOTA models?
Minstral had like 150b parameters or something.