If base model scaling has indeed broken down, I wonder how this manifests. Does the Chinchilla scaling law no longer hold beyond a certain size? Or does it still hold, but reduction in prediction loss no longer goes along with a proportional increase in benchmark performance? The latter could mean the quality of the (largely human generated) training data is the bottle neck.
If base model scaling has indeed broken down, I wonder how this manifests. Does the Chinchilla scaling law no longer hold beyond a certain size? Or does it still hold, but reduction in prediction loss no longer goes along with a proportional increase in benchmark performance? The latter could mean the quality of the (largely human generated) training data is the bottle neck.