I like your point that “surprises cut both ways” and assume that this is why your timelines aren’t affected by the possibility of surprises, is that about right? I am confused about the ~zero effect though: Isn’t double descent basically what we see with giant language models lately? Disclaimer: I don’t work on LLMs myself, so my confusion isn’t necessarily meaningful
My timelines are affected by the possibility of surprises; it makes them wider on both ends.
My impression is that giant language models are not trained to the interpolation point (though I haven’t been keeping up with the literature for the last year or so). I believe the graphs in that post were created specifically to demonstrate that if you did train them past the interpolation point, then you would see double descent.
I like your point that “surprises cut both ways” and assume that this is why your timelines aren’t affected by the possibility of surprises, is that about right? I am confused about the ~zero effect though: Isn’t double descent basically what we see with giant language models lately? Disclaimer: I don’t work on LLMs myself, so my confusion isn’t necessarily meaningful
My timelines are affected by the possibility of surprises; it makes them wider on both ends.
My impression is that giant language models are not trained to the interpolation point (though I haven’t been keeping up with the literature for the last year or so). I believe the graphs in that post were created specifically to demonstrate that if you did train them past the interpolation point, then you would see double descent.