We won’t be seeing optimally trained hundreds of trillions dense parameter models anytime soon.
We won’t be seeing them Chinchilla-trained, of course, but that’s a completely different claim. Chinchilla scaling is obviously suboptimal compared to something better, just like all scaling laws before it have been. And they’ve only gone one direction: down.
We won’t be seeing them Chinchilla-trained, of course, but that’s a completely different claim. Chinchilla scaling is obviously suboptimal compared to something better, just like all scaling laws before it have been. And they’ve only gone one direction: down.