GravitasGradient comments on [AN #136]: How well will GPT-N perform on downstream tasks?

GravitasGradient 4 Feb 2021 13:41 UTC
1 point
The predicted cost for GPT-N parameter improvements is for the “classical Transformer” architecture? Recent updates like the Performer should require substantially less compute and therefore cost.
- Rohin Shah 4 Feb 2021 20:51 UTC
  2 points
  Parent
  Yes, in general you want to account for hardware and software improvements. From the original post:
  Finally, it’s important to note that algorithmic advances are real and important. GPT-3 still uses a somewhat novel and unoptimised architecture, and I’d be unsurprised if we got architectures or training methods that were one or two orders of magnitude more compute-efficient in the next 5 years.
  From the summary:
  $100B -$1T at current prices, $1B - $10B given estimated hardware and software improvements over the next 5 − 10 years
  The $1B - $10B number is meant to include things like the Performer.