That was quite a while ago, and is not a very strongly worded claim. I think there was also evidence that Chinchilla got a constant factor wrong and people kept discovering that you wanted a substantially larger multiplier of data:parameter, which might fully account for any ‘slight bending’ back then—bending often just means you got a hyperparameter wrong and need to tune it better. (It’s a lot easier to break scaling than to improve it, so being away badly is not too interesting while bending the opposite direction is much more interesting.)
That was quite a while ago, and is not a very strongly worded claim. I think there was also evidence that Chinchilla got a constant factor wrong and people kept discovering that you wanted a substantially larger multiplier of data:parameter, which might fully account for any ‘slight bending’ back then—bending often just means you got a hyperparameter wrong and need to tune it better. (It’s a lot easier to break scaling than to improve it, so being away badly is not too interesting while bending the opposite direction is much more interesting.)