I think that the authors at least did some amount of work to distinguish the eras, but agree more work could be done.
Also I agree w/ Stella here that Turing, GPT-J, GShard, and Switch are probably better fit into the “large scale“ era.
I think that the authors at least did some amount of work to distinguish the eras, but agree more work could be done.
Also I agree w/ Stella here that Turing, GPT-J, GShard, and Switch are probably better fit into the “large scale“ era.