I think LLMs are great and plausibly superhuman at language
I think the problem might be that “language” encompasses a much broader variety of tasks than image generation. For example, generating poetry with a particular rhyming structure or meter seems to be a pretty “pure” language task, yet even GPT-4 struggles with it. Meanwhile, diffusion models with a quarter of the parameter count of GPT-4 can output art in a dizzying variety of styles, from Raphael-like neoclassical realism to Picasso-like cubism.
I think the problem might be that “language” encompasses a much broader variety of tasks than image generation. For example, generating poetry with a particular rhyming structure or meter seems to be a pretty “pure” language task, yet even GPT-4 struggles with it. Meanwhile, diffusion models with a quarter of the parameter count of GPT-4 can output art in a dizzying variety of styles, from Raphael-like neoclassical realism to Picasso-like cubism.