Thanks, that looks really useful! Do you have GPU price performance numbers for lower precision training? Models like Chinchilla were trained in bf16, so that seems a more relevant number.
The difference between tf16 and FP32 comes to a x15 factor IIRC. Though also ML developers seem to prioritise other characteristics than cost effectiveness when choosing GPUs like raw performance and interconnect, so you can’t just multiply the top price performance we showcase by this factor and expect that to match the cost performance of the largest ML runs today.
Thanks, that looks really useful! Do you have GPU price performance numbers for lower precision training? Models like Chinchilla were trained in bf16, so that seems a more relevant number.
Thanks Neel!
The difference between tf16 and FP32 comes to a x15 factor IIRC. Though also ML developers seem to prioritise other characteristics than cost effectiveness when choosing GPUs like raw performance and interconnect, so you can’t just multiply the top price performance we showcase by this factor and expect that to match the cost performance of the largest ML runs today.
More soon-ish.