Am I just inexperienced or confused, or is this paper using a lot of words to say effectively very little? Sure, this functional form works fine for a given set of regimes of scaling, but it effectively gives you no predictive ability to determine when the next break will occur.
Sorry if this is overly confrontational, but I keep seeing this paper on Twitter and elsewhere and I’m not sure I understand why.
When f (in equation 1 of the paper ( https://arxiv.org/abs/2210.14891 ) not the video) of next break is sufficiently large, it gives you predictive ability to determine when that next break will occur; also, the number of seeds needed to get such predictive ability is very large. When f of next break is sufficiently small (& nonnegative), it does not give you predictive ability to determine when that next break will occur.
Am I just inexperienced or confused, or is this paper using a lot of words to say effectively very little? Sure, this functional form works fine for a given set of regimes of scaling, but it effectively gives you no predictive ability to determine when the next break will occur.
Sorry if this is overly confrontational, but I keep seeing this paper on Twitter and elsewhere and I’m not sure I understand why.
When f (in equation 1 of the paper ( https://arxiv.org/abs/2210.14891 ) not the video) of next break is sufficiently large, it gives you predictive ability to determine when that next break will occur; also, the number of seeds needed to get such predictive ability is very large. When f of next break is sufficiently small (& nonnegative), it does not give you predictive ability to determine when that next break will occur.
Play around with fi in this code to see what I mean:
https://github.com/ethancaballero/broken_neural_scaling_laws/blob/main/make_figure_1__decomposition_of_bnsl_into_power_law_segments.py#L25-L29