Google’s new text-to-image model—Parti, a demonstration of scaling benefits

Google has released their latest text-to-image generation model- Parti. They provide a few prompts and showcase the differences between models trained on 350M, 750M, 3B and 20B parameters.

One difference from last week’s Imagen is that Parti is GAN-based. Imagen and DALL-E 2 are diffusion-based models, whereas Parti is a sequence-to-sequence model scaled highly on Transformer + VQGAN.

The announcement says

Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively.

...

We have decided not to release our Parti models, code, or data for public use without further safeguards in place.

There’s an interesting thread on Parti by Jason Baldridge here, and a short overview here by Google. I wonder how well the 20B model will do on text characters inside images compared to the other diffusion-based approaches like Imagen and DALL-E 2.