Data seems to be a bottleneck, so we should expect the number of model parameters to run high to compensate. Note, that a MMLU of 100% should be achievable using a model the same size as Megatron-Turing NLG, and a data only 2.1x more data than GPT-4, which should be achievable in the near term.
Data seems to be a bottleneck, so we should expect the number of model parameters to run high to compensate.
Note, that a MMLU of 100% should be achievable using a model the same size as Megatron-Turing NLG, and a data only 2.1x more data than GPT-4, which should be achievable in the near term.