I wonder how well the 20B model will do on text characters inside images compared to the other diffusion-based approaches like Imagen and DALL-E 2.
Well, you can see plenty of text in the samples. Obviously, like Imagen, it beats the pants off DALL-E 2 inasmuch as you can actually read the text; not a high bar. Harder to see if it really improves over Imagen: the COCO FID increase is small and otherwise they omit any real Imagen vs Parti head-to-head comparison. They advertise Parti’s ability to do long complex prompts with high fidelity, so maybe for long text insertions it’ll clearly win?
The readability difference, when compared to DALL-E 2, is laughable.
They have provided some examples after the references section, including some direct comparisons with DALL-E 2 for text in images. Also, PartiPrompts looks like a good collection of novel prompts for eval.
Well, you can see plenty of text in the samples. Obviously, like Imagen, it beats the pants off DALL-E 2 inasmuch as you can actually read the text; not a high bar. Harder to see if it really improves over Imagen: the COCO FID increase is small and otherwise they omit any real Imagen vs Parti head-to-head comparison. They advertise Parti’s ability to do long complex prompts with high fidelity, so maybe for long text insertions it’ll clearly win?
The readability difference, when compared to DALL-E 2, is laughable.
They have provided some examples after the references section, including some direct comparisons with DALL-E 2 for text in images. Also, PartiPrompts looks like a good collection of novel prompts for eval.