Aaron_Scher comments on How to Model the Future of Open-Source LLMs?

Aaron_Scher 22 Apr 2024 18:05 UTC
3 points
0
Um, looking at the scaling curves and seeing diminishing returns? I think this pattern is very clear for metrics like general text prediction (cross-entropy loss on large texts), less clear for standard capability benchmarks, and to-be-determined for complex tasks which may be economically valuable.
- General text prediction: see Chinchilla, Fig 1 of the GPT-4 technical report
- Capability benchmarks: see epoch post, the ~4th figure here
- Complex tasks: See GDM dangerous capability evals (Fig 9, which indicates Ultra is not much better than Pro, despite likely being trained on >5x the compute, though training details not public)
To be clear, I’m not saying that a $100m model will be very close to a $1b model. I’m saying that the trends indicate they will be much closer than you would think if you only thought about how big a 10x difference in training compute is, without being aware of the empirical trends of diminishing returns. The empirical trends indicate this will be a relatively small difference, but we don’t have nearly enough data for economically valuable tasks / complex tasks to be confident about this.
- p.b. 22 Apr 2024 20:05 UTC
  6 points
  1
  Parent
  Diminishing returns in loss are not diminishing returns in capabilities. And benchmarks tend to saturate, so diminishing returns are baked in if you look at those.
  I am not saying that there aren’t diminishing returns to scale, but I just haven’t seen anything definitive yet.