Yi-Lightning is a small MOE model that is extremely fast and inexpensive. Yi-Lightning costs only $0.14 (RMB0.99 ) /mil tokens [...] Yi-Lightning was pre-trained on 2000 H100s for 1 month, costing about $3 million, a tiny fraction of Grok-2.
Assuming it’s trained in BF16 with 40% compute utilization, that’s a 2e24 FLOPs model (Llama-3-70B is about 6e24 FLOPs, but it’s not MoE, so the FLOPs are not used as well). Assuming from per token price that it has 10-20B active parameters, it’s trained on 15-30T tokens. So not an exercise in extreme compute scaling, just excellent execution.
Kai-Fu Lee, CEO of 01 AI, posted on LinkedIn:
Assuming it’s trained in BF16 with 40% compute utilization, that’s a 2e24 FLOPs model (Llama-3-70B is about 6e24 FLOPs, but it’s not MoE, so the FLOPs are not used as well). Assuming from per token price that it has 10-20B active parameters, it’s trained on 15-30T tokens. So not an exercise in extreme compute scaling, just excellent execution.