Ethan finds empirically that neural network scaling laws (performance vs size, data, other things) are characterised by functions that look piecewise linear on a log log plot, and postulates that a “sharp left turn” describes a transition from a slower to a faster scaling regime. He also postulates that it might be predictable in advance using his functional form for scaling.
Ethan finds empirically that neural network scaling laws (performance vs size, data, other things) are characterised by functions that look piecewise linear on a log log plot, and postulates that a “sharp left turn” describes a transition from a slower to a faster scaling regime. He also postulates that it might be predictable in advance using his functional form for scaling.