p.b. comments on [Link] Training Compute-Optimal Large Language Models

p.b. 1 Apr 2022 16:53 UTC
1 point
This implies that optimal training of Gopher should have used 16x the data and compute.
It also implies that further scaling will be compute and data only for a while.
All the nice graphs will now get an ugly kink.
All the extrapolations to the human (neocortex) neuron count are off.
Really looking forward to reading the paper.