I’m more familiar with DALL-E 2 than with Midjourney, so I’ll assume that they have the same shortcomings. If not, feel free to ignore this.
It seems like there are still some crucial details that cause problems with AI art that will prevent it from being used for many types of art that will probably soon be fixed, and that’s why I would say “on the cusp” rather than “it’s already here”.
I think the biggest issue for your example with Magic cards, there’s a certain level of art style consistency between the cards in a set that is necessary. From my experience with DALL-E, that consistency isn’t possible yet. You’ll create one art piece with a prompt, but then edit the prompt slightly and it will have a rather different style. See, for example, Scott Alexander’s attempt at making stained glass: https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design
I’m curious if you tried making a set of Magic cards (even, say, ten cards) and then asked other people into Magic to decide which ten are better, how many would choose the existing set. I would bet that they would choose the existing set because of the style consistency.
Beyond that, like you said there are some places where the AIs are just not there yet. Images with text is one, like you mentioned. Another one that seems like it would be a big problem for a Magic set is human faces, which DALL-E is notoriously bad at. Worse, it’s bad at it in ways that are rather obvious to viewers.
Both of these issues seem likely to be solved soon, but they’re not here quite yet. My use of DALL-E so far would still incline me towards paying a real artist.
I think the biggest issue for your example with Magic cards, there’s a certain level of art style consistency between the cards in a set that is necessary. From my experience with DALL-E, that consistency isn’t possible yet. You’ll create one art piece with a prompt, but then edit the prompt slightly and it will have a rather different style.
As I keep emphasizing, DALL-E makes deliberate tradeoffs and is deliberately inaccessible, deliberately barring basic capabilities it ought to have like letting you use GLIDE directly, and so is a loose lower bound on current image synthesis capabilities, never mind future ones. For example, Stable Diffusion already is being used with style transfer and the final checkpoint hasn’t even been officially released yet (that’s scheduled for later today EDIT: out). So if you can’t get adequate stylistic similarity by simply dialing in a very long detailed prompt with style keywords (noting that due to the avoidance of unCLIP, Midjourney/Stable Diffusion seem to handle long prompts more like Imagen/Parti ie. better), you can generate a set of content images, a style image, and style transfer over the set.
And of course, now that you have the actual model, all sorts of approaches and improvements become available that you will never, ever be allowed to do with DALL-E 2.
Images with text is one, like you mentioned.
Imagen/Parti show that this is not an intrinsic challenge but solved by scale. (Now, if only Google would let you pay for any access to them, that would be the perfect rebuttal...) Also, this would be one of the easiest things to do yourself or hire a very quick easy commission for <<$200 to insert some lettering.
Another one that seems like it would be a big problem for a Magic set is human faces, which DALL-E is notoriously bad at.
No, it does faces pretty well IMO. And Make-A-Scene shows you can avoid solving it with scale by a face-specific loss.
I would have been very misinformed, in a very damaging way, in the work I do every day, if you hadn’t refuted some of the erroneous claims made in this post and in that comment.
On balance this post still would have been very helpful for my analyst work, but even more so thanks to you clearing this up.
At least for actual Magic cards, it’s not just a matter of consistency in some abstract sense. Cards from the same set need to relate to each other in very precise ways and the related constraints are much more subtle than “please keep the same style”.
Here you can find some examples of real art descriptions that got used for real cards (just google “site:magic.wizards.com art descriptions” for more examples). I could describe further constraints that are implicit in those already long descriptions. For example, consider the Cult Guildmage in the fourth image. When the art description lists “Guild: Rakdos”, it’s implicitly asking for giving the whole card a black-red tone, and possibly inserting the guild logo somewhere in the picture (the guild logo looks like this; note how the guildmage wears one).
I don’t want to dispute that AI-generated artworks are very cheap and can be absolutely stunning, but I still predict that AI as available today would make a terrible job if used to replace human illustrators for Magic cards (You could have a better time using AI artworks for a brand-new trading card game, however).
I think the biggest issue for your example with Magic cards, there’s a certain level of art style consistency between the cards in a set that is necessary. From my experience with DALL-E, that consistency isn’t possible yet. You’ll create one art piece with a prompt, but then edit the prompt slightly and it will have a rather different style
Hmm, I haven’t had much trouble getting Dall-E to output consistent styles. (I think there’s some upfront cost in figuring out how to get the style I want, but then it tends to work pretty reliably, or at least I develop a sense of how to tweak it to maintain the style. (albeit, this does take extra time, and is part of why in my other comment I note that I think Davis is undercounting the cost of AI art)
I’m more familiar with DALL-E 2 than with Midjourney, so I’ll assume that they have the same shortcomings. If not, feel free to ignore this. It seems like there are still some crucial details that cause problems with AI art that will prevent it from being used for many types of art that will probably soon be fixed, and that’s why I would say “on the cusp” rather than “it’s already here”. I think the biggest issue for your example with Magic cards, there’s a certain level of art style consistency between the cards in a set that is necessary. From my experience with DALL-E, that consistency isn’t possible yet. You’ll create one art piece with a prompt, but then edit the prompt slightly and it will have a rather different style. See, for example, Scott Alexander’s attempt at making stained glass: https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design I’m curious if you tried making a set of Magic cards (even, say, ten cards) and then asked other people into Magic to decide which ten are better, how many would choose the existing set. I would bet that they would choose the existing set because of the style consistency.
Beyond that, like you said there are some places where the AIs are just not there yet. Images with text is one, like you mentioned. Another one that seems like it would be a big problem for a Magic set is human faces, which DALL-E is notoriously bad at. Worse, it’s bad at it in ways that are rather obvious to viewers.
Both of these issues seem likely to be solved soon, but they’re not here quite yet. My use of DALL-E so far would still incline me towards paying a real artist.
As I keep emphasizing, DALL-E makes deliberate tradeoffs and is deliberately inaccessible, deliberately barring basic capabilities it ought to have like letting you use GLIDE directly, and so is a loose lower bound on current image synthesis capabilities, never mind future ones. For example, Stable Diffusion already is being used with style transfer and the final checkpoint hasn’t even been officially released yet (that’s scheduled for later today EDIT: out). So if you can’t get adequate stylistic similarity by simply dialing in a very long detailed prompt with style keywords (noting that due to the avoidance of unCLIP, Midjourney/Stable Diffusion seem to handle long prompts more like Imagen/Parti ie. better), you can generate a set of content images, a style image, and style transfer over the set.
And of course, now that you have the actual model, all sorts of approaches and improvements become available that you will never, ever be allowed to do with DALL-E 2.
Imagen/Parti show that this is not an intrinsic challenge but solved by scale. (Now, if only Google would let you pay for any access to them, that would be the perfect rebuttal...) Also, this would be one of the easiest things to do yourself or hire a very quick easy commission for <<$200 to insert some lettering.
No, it does faces pretty well IMO. And Make-A-Scene shows you can avoid solving it with scale by a face-specific loss.
Thank you for clarifying this.
I would have been very misinformed, in a very damaging way, in the work I do every day, if you hadn’t refuted some of the erroneous claims made in this post and in that comment.
On balance this post still would have been very helpful for my analyst work, but even more so thanks to you clearing this up.
At least for actual Magic cards, it’s not just a matter of consistency in some abstract sense. Cards from the same set need to relate to each other in very precise ways and the related constraints are much more subtle than “please keep the same style”.
Here you can find some examples of real art descriptions that got used for real cards (just google “site:magic.wizards.com art descriptions” for more examples). I could describe further constraints that are implicit in those already long descriptions. For example, consider the Cult Guildmage in the fourth image. When the art description lists “Guild: Rakdos”, it’s implicitly asking for giving the whole card a black-red tone, and possibly inserting the guild logo somewhere in the picture (the guild logo looks like this; note how the guildmage wears one).
I don’t want to dispute that AI-generated artworks are very cheap and can be absolutely stunning, but I still predict that AI as available today would make a terrible job if used to replace human illustrators for Magic cards (You could have a better time using AI artworks for a brand-new trading card game, however).
Midjourney seems to be better at stylistic consistency. E.g. see the images on the post, which are pretty stylistically consistent: https://alexanderwales.com/the-ai-art-apocalypse/
Hmm, I haven’t had much trouble getting Dall-E to output consistent styles. (I think there’s some upfront cost in figuring out how to get the style I want, but then it tends to work pretty reliably, or at least I develop a sense of how to tweak it to maintain the style. (albeit, this does take extra time, and is part of why in my other comment I note that I think Davis is undercounting the cost of AI art)