Yi-Lightning (01 AI) Chatbot Arena results are suprisingly strong for its price, which puts it at about 10B active parameters[1]. It’s above Claude 3.5 Sonnet and GPT-4o in Math, above Gemini 1.5 Pro 002 in English and Hard Prompts (English). It’s above all non-frontier models in Coding and Hard Prompts (both with Style Control), including Qwen-2.5-72B (trained on 18T tokens). Interesting if this is mostly a better methodology or compute scaling getting taken more seriously for a tiny model.
The developer’s site says it’s a MoE model. Developer’s API docs list it at ¥0.99/1M tokens. The currency must be Renminbi, so that’s about $0.14. Together serves Llama-3-8B for $0.10-0.18 (per million tokens), Qwen-2.5-7B for $0.30, all MoE models up to 56B total (not active) parameters for $0.60. (The prices for open weights models won’t have significant margins, and model size is known, unlike with lightweight closed models.)
Yi-Lightning is a small MOE model that is extremely fast and inexpensive. Yi-Lightning costs only $0.14 (RMB0.99 ) /mil tokens [...] Yi-Lightning was pre-trained on 2000 H100s for 1 month, costing about $3 million, a tiny fraction of Grok-2.
Assuming it’s trained in BF16 with 40% compute utilization, that’s a 2e24 FLOPs model (Llama-3-70B is about 6e24 FLOPs, but it’s not MoE, so the FLOPs are not used as well). Assuming from per token price that it has 10-20B active parameters, it’s trained on 15-30T tokens. So not an exercise in extreme compute scaling, just excellent execution.
Yi-Lightning (01 AI) Chatbot Arena results are suprisingly strong for its price, which puts it at about 10B active parameters[1]. It’s above Claude 3.5 Sonnet and GPT-4o in Math, above Gemini 1.5 Pro 002 in English and Hard Prompts (English). It’s above all non-frontier models in Coding and Hard Prompts (both with Style Control), including Qwen-2.5-72B (trained on 18T tokens). Interesting if this is mostly a better methodology or compute scaling getting taken more seriously for a tiny model.
The developer’s site says it’s a MoE model. Developer’s API docs list it at ¥0.99/1M tokens. The currency must be Renminbi, so that’s about $0.14. Together serves Llama-3-8B for $0.10-0.18 (per million tokens), Qwen-2.5-7B for $0.30, all MoE models up to 56B total (not active) parameters for $0.60. (The prices for open weights models won’t have significant margins, and model size is known, unlike with lightweight closed models.)
Kai-Fu Lee, CEO of 01 AI, posted on LinkedIn:
Assuming it’s trained in BF16 with 40% compute utilization, that’s a 2e24 FLOPs model (Llama-3-70B is about 6e24 FLOPs, but it’s not MoE, so the FLOPs are not used as well). Assuming from per token price that it has 10-20B active parameters, it’s trained on 15-30T tokens. So not an exercise in extreme compute scaling, just excellent execution.