One funky thing about this is that we shouldn’t see larger language models… at all, for at least a few years.
How long does it take to train them though? For a large enough value of large, the above seems obvious, and yet...why couldn’t a larger model be trained over more time? (Thinking Long And Slow)
How long does it take to train them though? For a large enough value of large, the above seems obvious, and yet...why couldn’t a larger model be trained over more time? (Thinking Long And Slow)