Pattern comments on New Scaling Laws for Large Language Models

Pattern 9 Apr 2022 18:26 UTC
2 points
One funky thing about this is that we shouldn’t see larger language models… at all, for at least a few years.
How long does it take to train them though? For a large enough value of large, the above seems obvious, and yet...why couldn’t a larger model be trained over more time? (Thinking Long And Slow)