This implies that optimal training of Gopher should have used 16x the data and compute.
It also implies that further scaling will be compute and data only for a while.
All the nice graphs will now get an ugly kink.
All the extrapolations to the human (neocortex) neuron count are off.
Really looking forward to reading the paper.
This implies that optimal training of Gopher should have used 16x the data and compute.
It also implies that further scaling will be compute and data only for a while.
All the nice graphs will now get an ugly kink.
All the extrapolations to the human (neocortex) neuron count are off.
Really looking forward to reading the paper.