Here’s what the curves look like if you fit them to the PaLM data-points as well as the GPT-3 data-points.
Keep in mind that this is still based on Kaplan scaling laws. The Chinchilla scaling laws would predict faster progress.
Linear:
Logistic:
The Chinchilla scaling laws would predict faster progress.
(But we wouldn’t observe that on these graphs because they weren’t trained Chinchilla-style, of course.)
Here’s what the curves look like if you fit them to the PaLM data-points as well as the GPT-3 data-points.
Keep in mind that this is still based on Kaplan scaling laws. The Chinchilla scaling laws would predict faster progress.
Linear:
Logistic:
(But we wouldn’t observe that on these graphs because they weren’t trained Chinchilla-style, of course.)