gwern comments on rhollerith_dot_com’s Shortform

gwern 28 Mar 2024 20:50 UTC
15 points
3
I agree. The problem with AI-generated images is that any image you can generate with a prompt like “robot looking at chessboard” is going to contain, almost by definition, no more information than that prompt did, but it takes a lot longer than reading the prompt to look at the image and ascertain that it contains no information and is just AI-generated imagery added ‘to look nice’. This is particularly jarring on a site like LW2 where, for better or worse, images are rarely present and usually highly-informative and dense with information when present.

Worse, they usually don’t ‘look nice’ either. Most of the time, people who use AI images can’t even be bothered to sample one without blatant artifacts, or to do some inpainting to fix up the worst anomalies, or figure out an appropriate style. The samples look bad to begin with, and a year later, they’re going to look even worse and more horribly dated, and make the post look much worse, like a spammer wrote it. (Almost all images from DALL-E 2 are already hopelessly nasty looking, and stuff from Midjourney-v1--3 and SD1.x likewise, and SD2/SD-XL/Midjourneyv4/5 are ailing.) It would be better if the authors of such posts could just insert text like [imagine 'a robot looking at a chessboard' here] if they are unable to suppress their addiction to SEO images; I can imagine that better than they can generate it, it seems.

So my advice would be that if you want some writing to still be read in a year and it to look good, then you should learn how to use the tools and spend at least an hour per image; and if you can’t do that, then don’t spend time on generating images at all (unless you’re writing about image generation, I suppose). Quickies are fine for funny tweets or groupchats, but serious readers deserve better. Meaningless images don’t need to be included, and the image generators will be much better in a year or two anyway and you can go back and add them if you really feel the need.

For Gwern.net, I’m satisfied with the images I’ve generated for my dropcap fonts or as thumbnail previews for a couple of my pages like “Suzanne Delage” or “Screwfly Solution” (where I believe they serve a useful ‘iconic’ summary role in popups & social media previews), but I also put in a lot of work: I typically generate scores to hundreds of images in both MJv5/6 & DALL-E 3, varying them heavily and randomizing as much as possible, before inpainting or tweaking them. (I generally select at a 1:4 or less ratio, and then select out of a few dozen; I archive a lot of the first-stage images in my Midjourney & DALL-E 3 tag-directories if you want to browse them.) It takes hours. But I am confident I will still like them years from now.