What is that formula based on? Can’t find anything from googling. I thought it may be from the OpenAI paper Scaling Laws for Neural Language Models, but can’t find it with ctrl+f.
It’s in the figure.
What is that formula based on? Can’t find anything from googling. I thought it may be from the OpenAI paper Scaling Laws for Neural Language Models, but can’t find it with ctrl+f.
It’s in the figure.