Adam_Barker comments on GPT-4 Predictions

Adam_Barker 4 Apr 2023 10:47 UTC
2 points
0
Here’s an equation for the MMLA vs Loss plot: $M M L A = - 2.468 * L o s s + 5.5174$

A MMLA = 100% corresponds to a loss of 1.8304. Using the scaling laws, listed here, this can be reached using:
- The GPT-4 dataset (4Gtokens) and a model 11x the size of Megatron-Turing NLG (6 trillion parameters). Compute time: 111 days on Eos.
- GPT-4′s 175B params with 18.5 trillion training tokens (4.6x the size of GPT-4′s dataset). Compute time: 16 days on Eos, but getting that many tokens may be a problem.
- Megatron-Turing NLG’s 530B parameters, and 8.5 trillion tokens (2.1x the size of GPT-4′s dataset). Compute time: 23 days on Eos. This is a much more reachable dataset.
  
  The nx compute speed of Eos used for GPT-4 was 18.4 ExaFLOP/s.