Note that, if the network converges towards the irreducible error like a negative exponential (on a plot with reducible error on the y-axis), it would be a straight line on a plot with the logarithm of the reducible error on the y-axis.
Was a little confused by this note. This does not apply to any of the graphs in the post, right? (Since you plot the straight reducible error on the y-axis, and not its logarithm, as I understand.)
Right, this does not apply to these graphs. It’s just a round-about way of saying that the upper end of s-curves (on a linear-log scale) eventually look roughly like power laws (on a linear-linear scale). We do have some evidence that errors are typically power laws in compute (and size and data), so I wanted to emphasize that s-curves are in line with that trend.
> It’s just a round-about way of saying that the upper end of s-curves (on a linear-log scale) eventually look roughly like power laws (on a linear-linear scale).
Doesn’t the upper end of an s-curve plateau to an asymptote (on any scale), which a power law does not (on any scale)?
Right, sorry. The power law is a function from compute to reducible error, which goes towards 0. This post’s graphs have the (achievable) accuracy on the y-axis, where error=1-accuracy (plus or minus a constant to account for achievability/reducibility). So a more accurate statement would be “the lower end of an inverted s-curve [a z-curve?] (on a linear-log scale) eventually look roughly like a power law (on a linear-linear scale)”.
In other words, a power law does have an asymptote, but it’s always an asymptote towards 0. So you need to transform the curve as 1-s to get the s-curve to also asymptote towards 0.
Was a little confused by this note. This does not apply to any of the graphs in the post, right? (Since you plot the straight reducible error on the y-axis, and not its logarithm, as I understand.)
Right, this does not apply to these graphs. It’s just a round-about way of saying that the upper end of s-curves (on a linear-log scale) eventually look roughly like power laws (on a linear-linear scale). We do have some evidence that errors are typically power laws in compute (and size and data), so I wanted to emphasize that s-curves are in line with that trend.
Thanks. Still not sure I understand though:
> It’s just a round-about way of saying that the upper end of s-curves (on a linear-log scale) eventually look roughly like power laws (on a linear-linear scale).
Doesn’t the upper end of an s-curve plateau to an asymptote (on any scale), which a power law does not (on any scale)?
Right, sorry. The power law is a function from compute to reducible error, which goes towards 0. This post’s graphs have the (achievable) accuracy on the y-axis, where error=1-accuracy (plus or minus a constant to account for achievability/reducibility). So a more accurate statement would be “the lower end of an inverted s-curve [a z-curve?] (on a linear-log scale) eventually look roughly like a power law (on a linear-linear scale)”.
In other words, a power law does have an asymptote, but it’s always an asymptote towards 0. So you need to transform the curve as 1-s to get the s-curve to also asymptote towards 0.
Ah, gotcha. Thanks!