540 billions parameters is about 3 times more than GPT-3 170 billions, which is consistent with a Moore Law doubling time of about 18 months. I don’t see how this is evidence for language model scalling slowing down.
As Adam said, trending with Moore’s Law is far slower than the previous trajectory of model scaling. In 2020 after the release of GPT-3, there was widespread speculation that by the next year trillion parameter models would begin to emerge.
540 billions parameters is about 3 times more than GPT-3 170 billions, which is consistent with a Moore Law doubling time of about 18 months. I don’t see how this is evidence for language model scalling slowing down.
As Adam said, trending with Moore’s Law is far slower than the previous trajectory of model scaling. In 2020 after the release of GPT-3, there was widespread speculation that by the next year trillion parameter models would begin to emerge.
Language model parameter counts were growing much faster than 2x/18mo for a while.