Rohin Shah comments on Deepmind’s Gato: Generalist Agent

Rohin Shah 13 May 2022 9:13 UTC
4 points
0
My timelines are affected by the possibility of surprises; it makes them wider on both ends.
My impression is that giant language models are not trained to the interpolation point (though I haven’t been keeping up with the literature for the last year or so). I believe the graphs in that post were created specifically to demonstrate that if you did train them past the interpolation point, then you would see double descent.