gwern comments on Leon Lang’s Shortform

gwern 3 Oct 2024 18:32 UTC
3 points
1
Well, I suppose it could be misspecification, but if there were some sort of misestimation of the intercept itself (despite the scaling law fits usually being eerily exact), is there some reason it would usually be in the direction of underestimating the intercept badly enough that we could actually be near hitting perfect performance and the divergence become noticeable? Seems like it could just as easily overestimate it and produce spuriously good looking performance as later models ‘overperform’.
- tailcalled 3 Oct 2024 18:43 UTC
  2 points
  0
  Parent
  I suppose that is logical enough.