tailcalled comments on Deepmind’s Gato: Generalist Agent

tailcalled 13 May 2022 9:34 UTC
4 points
- I also didn’t expect double descent and grokking but it’s worth noting that afaict those have had ~zero effects on SOTA capabilities so far.
I was under the impression that basically all SOTA capabilities rely on double descent. Is that impression wrong?
- Rohin Shah 13 May 2022 9:56 UTC
  3 points
  Parent
  … Where is that impression coming from? If this is a widespread view, I could just be wrong about it; I have a cached belief that large language models and probably other models aren’t trained to the interpolation threshold and so aren’t leveraging double descent.
  - tailcalled 13 May 2022 10:17 UTC
    4 points
    Parent
    I haven’t kept track of dataset size vs model size, but things I’ve read on the double descent phenomenon have generally described it as a unified model of the “classic statistics” paradigm where you need to deal with the bias-variance tradeoff, versus the “modern ML” paradigm where bigger=better.
    
    I guess it may depend on the domain? Generative tasks like language modelling or image encoding implicitly end up having a lot more bits/sample than discriminative tasks? So maybe generative tasks are usually not in the second descend regime while discriminative tasks usually are?