gwern comments on From GPT to AGI

gwern 1 Sep 2020 15:59 UTC
4 points
If you tweaked GPT-3 (let’s assume the total parameter count remained the same so layers were made a little narrower or somesuch) to have a 30k BPE context, I think the RAM requirements would explode to the point where even the small layers couldn’t fit their forward pass onto a single GPU. You can forget about training it too.