I expect there is still tons of low-hanging fruit available in LLM capabilities land. You could call this “algorithmic progress” if you want. This will decrease the compute cost necessary to get a given level of performance, thus raising the AI capability level accessible to less-resourced open-source AI projects.
Don’t you expect many of those improvements to remain closed-source from here on out, benefitting the teams that developed them at great (average) expense? And even the ones that are published freely will benefit the leaders just as much as their open-source chasers.
Um, looking at the scaling curves and seeing diminishing returns? I think this pattern is very clear for metrics like general text prediction (cross-entropy loss on large texts), less clear for standard capability benchmarks, and to-be-determined for complex tasks which may be economically valuable.
Capability benchmarks: see epoch post, the ~4th figure here
Complex tasks: See GDM dangerous capability evals (Fig 9, which indicates Ultra is not much better than Pro, despite likely being trained on >5x the compute, though training details not public)
To be clear, I’m not saying that a $100m model will be very close to a $1b model. I’m saying that the trends indicate they will be much closer than you would think if you only thought about how big a 10x difference in training compute is, without being aware of the empirical trends of diminishing returns. The empirical trends indicate this will be a relatively small difference, but we don’t have nearly enough data for economically valuable tasks / complex tasks to be confident about this.
Diminishing returns in loss are not diminishing returns in capabilities. And benchmarks tend to saturate, so diminishing returns are baked in if you look at those.
I am not saying that there aren’t diminishing returns to scale, but I just haven’t seen anything definitive yet.
What’s your argument for that?
Don’t you expect many of those improvements to remain closed-source from here on out, benefitting the teams that developed them at great (average) expense? And even the ones that are published freely will benefit the leaders just as much as their open-source chasers.
Um, looking at the scaling curves and seeing diminishing returns? I think this pattern is very clear for metrics like general text prediction (cross-entropy loss on large texts), less clear for standard capability benchmarks, and to-be-determined for complex tasks which may be economically valuable.
General text prediction: see Chinchilla, Fig 1 of the GPT-4 technical report
Capability benchmarks: see epoch post, the ~4th figure here
Complex tasks: See GDM dangerous capability evals (Fig 9, which indicates Ultra is not much better than Pro, despite likely being trained on >5x the compute, though training details not public)
To be clear, I’m not saying that a $100m model will be very close to a $1b model. I’m saying that the trends indicate they will be much closer than you would think if you only thought about how big a 10x difference in training compute is, without being aware of the empirical trends of diminishing returns. The empirical trends indicate this will be a relatively small difference, but we don’t have nearly enough data for economically valuable tasks / complex tasks to be confident about this.
Diminishing returns in loss are not diminishing returns in capabilities. And benchmarks tend to saturate, so diminishing returns are baked in if you look at those.
I am not saying that there aren’t diminishing returns to scale, but I just haven’t seen anything definitive yet.