gwern comments on Parameter counts in Machine Learning

gwern 28 Jun 2021 21:10 UTC
LW: 2 AF: 1
AF
Could it be inefficient scaling? Most work not explicitly using scaling laws to plan it seems to generally overestimate in compute per parameter, using too-small models. Anyone want to try to apply Jones 2021 to see if AlphaZero was scaled wrong?
- gwern 28 Jun 2024 2:45 UTC
  LW: 6 AF: 4
  1
  AF Parent
  Ben Adlam (via Maloney et al 2022) makes an interesting point: if you plot parameters vs training data, it’s a nearly perfect 1:1 ratio historically. (He doesn’t seem to have published anything formally on this.)
  - Jsevillamol 28 Jun 2024 18:05 UTC
    LW: 2 AF: 1
    0
    AF Parent
    We have conveniently just updated our database if anyone wants to investigate this further!
    https://epochai.org/data/notable-ai-models