Indeed, but an image generator is supposed to be useful for something other than generating an endless scroll of generic awesome pictures with wonky details; this kind of thing becomes boring really quickly. What most people actually need from an image generator is a sufficiently good replacement for the drawing skill they don’t have.
To be clear, I share Portia’s frustrarion here. I’ve been trying to get image generators to generate DnD portraits for months, and if the character is something more complicated than a Generic Tolkienian Elf or similar, you have to play increasingly complex shenanigans to obtain passable results. For example, I really really couldn’t convince the AI to generate an elf literally made of green metal rather than just dressed in green (this was supposed to represent the effect of a particular prestige class turning the character into a construct).
Hu, actually I never tried just the face, I needed at least the upper torso and preferably the full figure.
Anyway, I spent a few hours today toying with that generator (I previously used mostly this). A very simple prompt like “An elf made out of green metal” can produce a somewhat okay result, but the elf will be either naked or dressed head to toe in green. You can try to add more bits to the prompt in a controlled manner: hair color/hairstyle, outfit/dress color, and the like, but the more details you add, the more the model is prone to forget some of them, and the first to be forgotten is often the most important (being made of green metal).
To be clear, the success rate is not 0%. I was eventually able to obtain an image kinda resembling what I wanted, but I had to sit through >200 bad images and it definitely wasn’t an easy task. For these kind of things, we are totally not at the point where image generation “just works” (if you instead need a generic fantasy elf, sure, then it just works on the first try).
Yeah, though Dall-E 3 generally has better language understanding than other text-to-image models. (See e.g. here) I still think the “experimental” approach is more interesting for me personally than the deliberate one you describe. For example, with the previous Bing Image Creator (Dall-E 2.5), I “explored” photographs of fictional places, like Tlön, Uqbar, and strange art in an abandoned museum in Atlantis. It is a process of discovery rather than targeted creation. It’s probably personal preference. I’m not very creative, so I wouldn’t know what to draw if I could draw.
Indeed, but an image generator is supposed to be useful for something other than generating an endless scroll of generic awesome pictures with wonky details; this kind of thing becomes boring really quickly. What most people actually need from an image generator is a sufficiently good replacement for the drawing skill they don’t have.
To be clear, I share Portia’s frustrarion here. I’ve been trying to get image generators to generate DnD portraits for months, and if the character is something more complicated than a Generic Tolkienian Elf or similar, you have to play increasingly complex shenanigans to obtain passable results. For example, I really really couldn’t convince the AI to generate an elf literally made of green metal rather than just dressed in green (this was supposed to represent the effect of a particular prestige class turning the character into a construct).
SDXL gives me something like this. But I don’t know, not what you had in mind?
I used this hugging face space: https://huggingface.co/spaces/google/sdxl
And a prompt roughly: An elven face made out of green metal—dungeons and dragons, fantasy, awesome lighting
Hu, actually I never tried just the face, I needed at least the upper torso and preferably the full figure.
Anyway, I spent a few hours today toying with that generator (I previously used mostly this). A very simple prompt like “An elf made out of green metal” can produce a somewhat okay result, but the elf will be either naked or dressed head to toe in green. You can try to add more bits to the prompt in a controlled manner: hair color/hairstyle, outfit/dress color, and the like, but the more details you add, the more the model is prone to forget some of them, and the first to be forgotten is often the most important (being made of green metal).
To be clear, the success rate is not 0%. I was eventually able to obtain an image kinda resembling what I wanted, but I had to sit through >200 bad images and it definitely wasn’t an easy task. For these kind of things, we are totally not at the point where image generation “just works” (if you instead need a generic fantasy elf, sure, then it just works on the first try).
Yeah, though Dall-E 3 generally has better language understanding than other text-to-image models. (See e.g. here) I still think the “experimental” approach is more interesting for me personally than the deliberate one you describe. For example, with the previous Bing Image Creator (Dall-E 2.5), I “explored” photographs of fictional places, like Tlön, Uqbar, and strange art in an abandoned museum in Atlantis. It is a process of discovery rather than targeted creation. It’s probably personal preference. I’m not very creative, so I wouldn’t know what to draw if I could draw.