cubefox comments on Please stop using mediocre AI art in your posts

cubefox 25 Aug 2024 19:37 UTC
44 points
9
Yeah. Regarding Dall-E 3 specifically: Few people know that there was an unnamed model between Dall-E 2 and 3: “Dall-E 2.5”, as I like to call it. It was initially used in Microsoft’s Bing Image Creator. It often produced surprisingly good aesthetics, especially with very short or unspecific prompts.
Then it got replaced with Dall-E 3, which produces substantially fewer visual errors (like e.g. too many limbs), and it has a much better complex prompt following ability, but its style is also way worse than 2.5. Like apparently most other text-to-image models, Dall-E 3 largely produces tacky, aesthetically worthless kitsch. RIP Dall-E 2.5, which is now completely unavailable. :(
A few comparisons using old images I did a while ago:
Dall-E 2.5: “A cat reading a book”
Dall-E 3: “A cat reading a book”
Dall-E 2.5: “strange woman, surrealistic photorealism”
Dall-E 3: “strange woman, surrealistic photorealism”
Dall-E 2.5: “Dracula”
Dall-E 3: “Dracula”
Dall-E 2.5: “Photograph of the unlikely guests”
Dall-E 3: “Photograph of the unlikely guests”
Dall-E 2.5: “surrealism”
Dall-E 3: “surrealism”
Dall-E 2.5: “inside view”
Dall-E 3: “inside view”
Dall-E 2.5: “view from the window”
Dall-E 3: “view from the window”
Dall-E 2.5: “the unspeakable”
Dall-E 3: “the unspeakable”
Dall-E 2.5: “the unlikely guest”
Dall-E 3: “the unlikely guest”
Dall-E 2.5: “The Mistake”
Dall-E 2.5: “The Mistake”
- Raemon 25 Aug 2024 20:24 UTC
  5 points
  0
  Parent
  Somewhat relatedly: when I started this post I planned to argue you should use midjourney instead of DallE, but then when I went to test it I found that Midjourney had become more generic in some way that was hard to place. I think it’s still better than DallE default but not a slam dunk
  - Raemon 27 Aug 2024 17:01 UTC
    7 points
    0
    Parent
    I decided to try out cubefox’s prompts on current Midjourney to give a sense of it
    It actually did pretty well. I think my previous “hrm, it doesn’t seem as good” was when I was specifically trying to get it to generate images for LessWrong posts, where it seemed to be defaulting to very generic landscapes (or generic women’s faces). I’ll try a round of those on recent curated posts.
    - Raemon 27 Aug 2024 17:26 UTC
      5 points
      0
      Parent
      I tried again with a recent curated post, just putting in the title. (historically, practice I usually start with the post title and see if that outputs anything interesting, and then started exploring other ideas based on my understanding of the post and what visuals I thought would be good. But, this is pretty time consuming, so seeing what the “one shot” result is is useful).
      Here’s “Liability regimes for AI, watercolor painting”, from the latest Midjourney (v6)
      for contrast, here’s what Midjourney 4 resulted in:
      (I kinda like the Lady Justice optin)
      Midjourney v3
      v2
      Midjourney v3 feels like hit some kind of sweet spot of “relatively high quality” but also “a bit more weird/quirky/dreamlike.”
      I went and tried to do Midjourney v3, this time putting in more “quality” words since it isn’t automatically trained on aesthetics as hard:
      
      ”Liability regimes for AI”, beautiful aquarelle painting by Thomas Schaller, high res:
      okay now I guess I’ll just wander back up the chain of versions, this time using my Metis on what kinds of prompts will get better results rather than seeing how it handles the dumbest case:
      “Liability regimes for AI”, beautiful aquarelle painting by Thomas Schaller, high res –– Midjourney V4
      “Liability regimes for AI”, beautiful aquarelle painting by Thomas Schaller, high res—MJ version 6
      - Dweomite 28 Aug 2024 4:49 UTC
        2 points
        0
        Parent
        I preferred your v4 outputs for both prompts. They seem substantially more evocative of the subject matter than v6 while looking substantially better than v3, IMO.
        (This was a pretty abstract prompt, though, which I imagine poses difficulty?)
        I am not an artist.
    - cubefox 30 Aug 2024 14:03 UTC
      2 points
      0
      Parent
      Oh, so current Midjourney is actually far better than Dall-E 3 in terms of aesthetics. One thing I still liked about the old Dall-E 2.5 was its relatively strong tendency towards photorealism. Because arguably “a cat reading a book” describes an image of an actual cat reading a book, rather than an image of an illustration of a cat reading a book, or some mixture of photo and Pixar caricature, as in the case of Dall-E 3. Though this could probably be adjusted with adding “a photo of”.
  - cubefox 25 Aug 2024 20:48 UTC
    7 points
    0
    Parent
    I guess the bad aesthetics are to some extent a side effect of some training/fine-tuning step that improves some other metric (like prompt following), and they don’t have a person who knows/cares about art enough to block “improvements” with such side effects.
    
    In case of Dall-E the history was something like this: Dall-E 1/2: No real style, generations did look presumably like an average prediction from the training sample, e.g. like a result from Google Images. Dall-E 2.5: Mostly good aesthetics, e.g. portraits tend to have dramatic lighting and contrast. Dall-E 3: Very tacky aesthetics, probably not intentional but a side effect from something else.
    
    So I guess the bad style would be in general a mostly solvable problem (or one which could be weighed against other metrics), if the responsible people are even aware there is a problem. Which they might not be, given that they probably have a background in CS rather than art.
    - gwern 26 Aug 2024 1:02 UTC
      14 points
      5
      Parent
      
      I guess the bad aesthetics are to some extent a side effect of some training/fine-tuning step that improves some other metric (like prompt following), and they don’t have a person who knows/cares about art enough to block “improvements” with such side effects.
      
      Also probably a lot of it is just mode collapse from simple preference learning optimization. Each of your comparisons shows a daring, risky choice which a rater might not prefer, vs a very bland, neutral, obvious, colorful output. A lot of the image generations gains are illusory, and caused simply by a mode-collapse down onto a few well-rated points:
      
      Our experiments suggest that realism and consistency can both be improved simultaneously; however, there exists a clear tradeoff between realism/consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency/realism, and more recent models excel in consistency/realism while decreasing the representation diversity.
      
      Same problem as tuning LLMs. It’s a sugar-rush, like spending Mickey Mouse bucks at Disney World: it gives you the illusion of progress and feels like it’s free, but in reality you’ve paid for every ‘gain’.
  - gwern 26 Aug 2024 1:09 UTC
    6 points
    1
    Parent
    
    I found that Midjourney had become more generic in some way that was hard to place.
    
    What you can try doing is enabling the personalization (or use mine), to drag it away from the generic MJ look, and then experimenting with the chaos sampling option to find something more interesting you can then work with & vary.
  - dirk 26 Aug 2024 3:02 UTC
    3 points
    2
    Parent
    I’m told (by the ‘simple parameters’ section of this guide, which I have not had the opportunity to test but which to my layperson’s eye seems promisingly mechanistic in approach) that adjusting the stylize parameter to numbers lower than its default 100 turns down the midjourney-house-style effect (at the cost of sometimes tending to make things more collage-y and incoherent as values get lower), and that increasing the weird parameter above its default 0 will effectively push things to be unlike the default style (more or less).
  - Tao Lin 26 Aug 2024 19:41 UTC
    1 point
    0
    Parent
    it’s sad that open source models like Flux have a lot of potential for customized workflows and finetuning but few people use them
    - Raemon 26 Aug 2024 20:17 UTC
      5 points
      0
      Parent
      We’ve talked (a little) about integrating Flux more into LW, to make it easier to make good images. (maybe with a soft-nudge towards using “LessWrong watercolor style” by default if you don’t specify something else),
      Although something habryka brought up is a lot of people’s images seem to be coming from substack, which has it’s own (bad) version of it.
- Phib 26 Aug 2024 9:03 UTC
  3 points
  4
  Parent
  Beautiful, thanks for sharing