Crucially, power law scaling is actually pretty bad and means that performance grows relatively slowly with scale. A model with twice as many parameters or twice as much data does not perform twice as well. These diminishing returns to intelligence are of immense importance for forecasting AI risks since whether FOOM is possible or not depends heavily on the returns to increasing intelligence in the range around the human level.
Something I’m too sleep deprived to think clearly on right now but want your take on.
Which best describes these scaling effects:
Sublinear cumulative returns to cognitive investment from computational resources (model size, training data size, training compute budget)
Superlinearly diminishing marginal returns to cognitive investment from computational resources
While they are related, they aren’t quite the same (e.g. logarithmically growing cumulative returns are unbounded, while exponentially diminishing marginal returns imply bounded cumulative returns (geometric series with ratios <1 converge). And I haven’t yet played around with the maths enough to have confident takes on what a particular kind of cumulative returns implies about a particular kind of marginal returns and vice versa.
For now, I think I want to distinguish between those two terms (at least until I’ve worked out the maths and understand how they relate).
I tend to think of the scaling laws as sublinear cumulative returns (a la algorithms with superlinear time/space complexity [where returns are measured by the problem sizes the system can solve with a given compute budget]), but you’re way more informed on this than me (I cannot grok the scaling law papers).
Something I’m too sleep deprived to think clearly on right now but want your take on.
Which best describes these scaling effects:
Sublinear cumulative returns to cognitive investment from computational resources (model size, training data size, training compute budget)
Superlinearly diminishing marginal returns to cognitive investment from computational resources
While they are related, they aren’t quite the same (e.g. logarithmically growing cumulative returns are unbounded, while exponentially diminishing marginal returns imply bounded cumulative returns (geometric series with ratios <1 converge). And I haven’t yet played around with the maths enough to have confident takes on what a particular kind of cumulative returns implies about a particular kind of marginal returns and vice versa.
For now, I think I want to distinguish between those two terms (at least until I’ve worked out the maths and understand how they relate).
I tend to think of the scaling laws as sublinear cumulative returns (a la algorithms with superlinear time/space complexity [where returns are measured by the problem sizes the system can solve with a given compute budget]), but you’re way more informed on this than me (I cannot grok the scaling law papers).