If this is true, then this would mean that the scaling laws would highly underestimate the effects of scale, since the exponential increase in the number of tasks solvable given a unit of log-loss would precisely counteract the logarithmic increase in parameter count, meaning that there would be a linear relationship between the parameter count and the number of tasks that can be solved.
Does parameter count increase logarithmically with unit log loss? Is this a typo, or am I just confused about this?
Does parameter count increase logarithmically with unit log loss? Is this a typo, or am I just confused about this?