gwern comments on Leon Lang’s Shortform

gwern 3 Oct 2024 17:45 UTC
12 points
4
Isn’t an intercept offset already usually included in the scaling laws and so can’t be misleading anyone? I didn’t think anyone was fitting scaling laws which allow going to exactly 0 with no intrinsic entropy.
- tailcalled 3 Oct 2024 18:23 UTC
  2 points
  0
  Parent
  Couldn’t it just be that the intercept has been extrapolated wrongly, perhaps due to misspecification on the lower end of the scaling law?
  Or I guess often people combine multiple scaling laws to get optimal performance as a function of compute. That introduces a lot of complexity and I’m not sure where that puts us as to realistic errors.
  - gwern 3 Oct 2024 18:32 UTC
    3 points
    1
    Parent
    Well, I suppose it could be misspecification, but if there were some sort of misestimation of the intercept itself (despite the scaling law fits usually being eerily exact), is there some reason it would usually be in the direction of underestimating the intercept badly enough that we could actually be near hitting perfect performance and the divergence become noticeable? Seems like it could just as easily overestimate it and produce spuriously good looking performance as later models ‘overperform’.
    - tailcalled 3 Oct 2024 18:43 UTC
      2 points
      0
      Parent
      I suppose that is logical enough.