Better meaning more capability per unit of compute? If so, how can we be confident that it’s better than Chinchilla?
I can see an argument that it should be at least as good — if they were throwing so much money at it, they would surely do what is currently known best practice. But is there evidence to suggest that they figured out how to do things more efficiently than had ever been done before?
Better meaning more capability per unit of compute? If so, how can we be confident that it’s better than Chinchilla?
I can see an argument that it should be at least as good — if they were throwing so much money at it, they would surely do what is currently known best practice. But is there evidence to suggest that they figured out how to do things more efficiently than had ever been done before?