sanxiyn comments on NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

sanxiyn 12 Oct 2021 12:49 UTC
3 points
Google had trillion parameter model in fall 2020.
That’s new to me. Any citation for this?
- tonyleary 20 Oct 2021 14:33 UTC
  3 points
  Parent
  https://arxiv.org/abs/2101.03961
- avturchin 12 Oct 2021 15:25 UTC
  2 points
  Parent
  It was not google, but Microsoft: in September 2020 they wrote: “The trillion-parameter model has 298 layers of Transformers with a hidden dimension of 17,408 and is trained with sequence length 2,048 and batch size 2,048.” https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/
  - gwern 12 Oct 2021 17:06 UTC
    14 points
    Parent
    That was just the tech demo. It obviously wasn’t actually trained, just demoed for a few steps to show that the code works. If they had trained it back then, they’d, well, have announced it like OP! OP is what they’ve trained for-reals after further fiddling with that code.