I think the biggest issue for your example with Magic cards, there’s a certain level of art style consistency between the cards in a set that is necessary. From my experience with DALL-E, that consistency isn’t possible yet. You’ll create one art piece with a prompt, but then edit the prompt slightly and it will have a rather different style.
As I keep emphasizing, DALL-E makes deliberate tradeoffs and is deliberately inaccessible, deliberately barring basic capabilities it ought to have like letting you use GLIDE directly, and so is a loose lower bound on current image synthesis capabilities, never mind future ones. For example, Stable Diffusion already is being used with style transfer and the final checkpoint hasn’t even been officially released yet (that’s scheduled for later today EDIT: out). So if you can’t get adequate stylistic similarity by simply dialing in a very long detailed prompt with style keywords (noting that due to the avoidance of unCLIP, Midjourney/Stable Diffusion seem to handle long prompts more like Imagen/Parti ie. better), you can generate a set of content images, a style image, and style transfer over the set.
And of course, now that you have the actual model, all sorts of approaches and improvements become available that you will never, ever be allowed to do with DALL-E 2.
Images with text is one, like you mentioned.
Imagen/Parti show that this is not an intrinsic challenge but solved by scale. (Now, if only Google would let you pay for any access to them, that would be the perfect rebuttal...) Also, this would be one of the easiest things to do yourself or hire a very quick easy commission for <<$200 to insert some lettering.
Another one that seems like it would be a big problem for a Magic set is human faces, which DALL-E is notoriously bad at.
No, it does faces pretty well IMO. And Make-A-Scene shows you can avoid solving it with scale by a face-specific loss.
I would have been very misinformed, in a very damaging way, in the work I do every day, if you hadn’t refuted some of the erroneous claims made in this post and in that comment.
On balance this post still would have been very helpful for my analyst work, but even more so thanks to you clearing this up.
As I keep emphasizing, DALL-E makes deliberate tradeoffs and is deliberately inaccessible, deliberately barring basic capabilities it ought to have like letting you use GLIDE directly, and so is a loose lower bound on current image synthesis capabilities, never mind future ones. For example, Stable Diffusion already is being used with style transfer and the final checkpoint hasn’t even been officially released yet (that’s scheduled for later today EDIT: out). So if you can’t get adequate stylistic similarity by simply dialing in a very long detailed prompt with style keywords (noting that due to the avoidance of unCLIP, Midjourney/Stable Diffusion seem to handle long prompts more like Imagen/Parti ie. better), you can generate a set of content images, a style image, and style transfer over the set.
And of course, now that you have the actual model, all sorts of approaches and improvements become available that you will never, ever be allowed to do with DALL-E 2.
Imagen/Parti show that this is not an intrinsic challenge but solved by scale. (Now, if only Google would let you pay for any access to them, that would be the perfect rebuttal...) Also, this would be one of the easiest things to do yourself or hire a very quick easy commission for <<$200 to insert some lettering.
No, it does faces pretty well IMO. And Make-A-Scene shows you can avoid solving it with scale by a face-specific loss.
Thank you for clarifying this.
I would have been very misinformed, in a very damaging way, in the work I do every day, if you hadn’t refuted some of the erroneous claims made in this post and in that comment.
On balance this post still would have been very helpful for my analyst work, but even more so thanks to you clearing this up.