Lukas Finnveden comments on PaLM in “Extrapolating GPT-N performance”

Lukas Finnveden 5 Jun 2022 21:45 UTC
LW: 11 AF: 5
AF
Here’s what the curves look like if you fit them to the PaLM data-points as well as the GPT-3 data-points.
Keep in mind that this is still based on Kaplan scaling laws. The Chinchilla scaling laws would predict faster progress.
Linear:
Logistic:
What links here?
- PaLM-2 & GPT-4 in “Extrapolating GPT-N performance” by Lukas Finnveden (30 May 2023 18:33 UTC; 55 points)
- gwern 6 Jun 2022 0:42 UTC
  LW: 4 AF: 1
  AF Parent
  
  The Chinchilla scaling laws would predict faster progress.
  
  (But we wouldn’t observe that on these graphs because they weren’t trained Chinchilla-style, of course.)