Playing with DALL·E 2

I got access to Dall·E 2 yesterday. Here are some pretty pictures!

My goal was to try to understand what things DE2 could do well, and what things it had trouble understanding or generating. My general hypothesis is that it would do a better job with things that are easy to find on the internet (cute animals, digital scifi things, famous art) and less well with more abstract or more unusual things.

Here’s how it works: you put in a description of a picture, and it thinks for ~20 seconds and then produces 10 photos that are variations on that description. The diversity varies quite a bit depending on the prompt.

Let’s see some puppies!

goldendoodle puppy in play position

One thing to be aware of when you see amazing pictures that DE2 generates, is that there is some cherry picking going on. It often takes a few prompts to find something awesome, so you might have looked at dozens of images or more.

Still, this is pretty great! Those are recognizably goldendoodle puppies, mostly in something approximating play position.

You can see that the proportions in the generated images are not quite right, and some of the detail is off if you look closely. For instance, the front legs are too long here, the face isn’t quite right, and the ears are a bit weird.

Still, it’s pretty amazing given that it generated this from scratch. Check out how realistic the grass looks. I also like that the background is blurred, though not quite in the way that a camera would do it—the transition is too abrupt.

Ok but the point of this isn’t that they have a great image generation transformer, though it’s clearly that. The key thing is is its magical ability to actually follow instructions or descriptions of images. Particularly interesting is compositionality—can it combine concepts to generate something it’s never seen before? Answer: yes!

pop art kittens

The concept of “kitten” is pretty simply, though note that a kitten can be rendered in a ton of ways, from line drawings to cute art to photorealistic. Pop art is more complicated: it’s a celebration of everyday images, and one of the most commonly known versions is Warhol’s collection of repeated images in a grid with neon colors that vary per cell. And it mostly gets those things right.

lots of pop art kittens

What about weird things? You can put in any input and it’ll do something.

an ai in the shape of a campfire telling stories to an audience of enthralled forest animals

None of those are twitter worthy, but with some trial and error you can get things that are interesting.

an ai in the shape of a campfire telling stories to an audience of enthralled forest animals in hyperrealistic digital fantasy style

“Digital style” is one of the suggestions for getting better images.

X in Y style is fun, that’s a lot of the images you see out in the world. Weirdly it’s pretty sensitive to exactly the order you put things in.

Back to puppies, you get pretty different results depending on the placement of “surrealistic” even though the rephrasings seem semantically identical or at least very similar.

goldendoodle puppy in play position in surrealistic style
surrealistic goldendoodle puppy in play position
goldendoodle puppy in play position in the style of surrealism

One place where DE2 clearly falls down is in generating people. I generated an image for [four people playing poker in a dark room, with the table brightly lit by an ornate chandelier], and people didn’t look human—more like the typical GAN-style images where you can see the concept but the details are all wrong.

Update: image removed because the guidelines specifically call out not sharing realistic human faces.

Anything involving people, small defined objects, and so on, looks much more like the previous systems in this area. You can tell that it has all the concepts, but can’t translate them into something realistic.

This could be deliberate, for safety reasons—realistic images of people are much more open to abuse than other things. Porn, deep fakes, violence, and so on are much more worrisome with people. They also mentioned that they scrubbed out lots of bad stuff from the training data; possibly one way they did that was removing most images with people.

Things look much better with animals, and better again with an artistic style.

a dream-like oil painting by renoir of dogs playing poker

The cards aren’t right. Dice seem to be a lot easier.

People can also be pretty good if you don’t see faces, though the hands are definitely not right.

a street protest in belfast

Stlalm Anit is my new slogan.

In general all writing I’ve seen is bad. I think this is less likely to be about safety, and more that it’s hard to learn language by looking at a lot of images. However, since DE2 is trained on text, it clearly knows a lot about language at some level—I would expect there’s plenty of data to put out coherent text. Instead it outputs nonsense, focusing on getting the fonts and the background right.

a poem about the singularity on the back wall of a gymnasium
a poem about the singularity written in a serif font

I definitely see serifs! I do not see sense.

Overall this is more powerful, flexible, and accurate than the previous best systems. It still is easy to find holes in it, with with some patience and willingness to iterate, you can make some amazing images.

In conclusion, generating a lot of images from a new state-of-the-art image generation system is fun, thanks for reading. If there’s interest, I can also explore in-painting and Here are a few more gratuitous pics!

a playing card ace of diamonds, but a target instead of the diamonds
a 3d rendering of an octopus riding a motorcycle through outer space
a cute cartoon image of thor holding up his hammer which is being struck by lightning. in the background, there is an ice giant
astral codex ten
a blog post on less wrong

Reader requests:

the future, in the style of the 70′s
colorless green ideas sleep furiously, digital art
a cybernetic wall street bull with bitcoin lasers for eyes

Is that more or less cool than the actual statue they built in Miami?

a potato wearing a trench coat in a heroic pose, 3d digital art
a proof of the Riemann hypothesis
galaxies colliding in the style of Vincent van Gogh
a strange attractor made of butterflies
strange attractor made of lorenz butterflies
Charles Babbage’s completed Analytical Engine
the concept of beauty

The concept of beauty, according to DE2, is mostly women putting on makeup, which I can’t post due to restrictions on posting faces. These are really realistic, capturing ethnicity and expressing emotion, totally unlike the poker players from earlier. But there’s this one pastoral scene, which is nice.

a girl in the forest with butterfly ideas

This last one I edited out some floating writing on the left, and asked it to generate [a girl in a beautiful serene forest]. This one was also nice:

a girl in the forest with butterfly ideas
small chunks of curiosity
giant peaceful butterfly with rainbow trail and sunglasses wanders in to new york in the morning
Screenshot from the anime adaptation of James Joyce’s novel Finnegans Wake

Seems kind of like generic anime and not so much Finnegan’s Wake.

a group of red penguins playing poker

What are those penguins on the bottom left doing?!?

An animal looking curiously in the mirror, but the reflection is a different kind of animal; in digital style.
A cat looking curiously in the mirror, but the reflection is a different kind of animal; in digital style
A cat looking curiously in the mirror, but the reflection is a dog; in digital style

This series suggests that DE2 gets reflections pretty well, but either doesn’t understand what it means to have something else be the reflection, or the prior for a reflection reflecting the thing looking in the mirror is too hard for it to override.

Here’s one where I edited out the cat in the mirror and changed the prompt to be about a dog, and it did something sensible.

A dog looking curiously in the mirror, in digital style
underwater scuba diving cat discovers a pearl
hello world
hello world printed on a computer screen

It got it right twice out of 10 tries, that’s good right?

a new ai system that can create realistic images

I tried to ask for Dall-E by name but that was a content policy violation.

image creation AI by openAI
Pikachu walking up the stairs of a crisp glossy black castle next to the sea at night. At the top of the stairs is a red carpet. A black regenerative liquid being is jumping out of a green metal Mario pipe at the end of the carpet. Ultra 4K definition.

It managed to get most of those elements in. Ultimately none of those is really satisfying though.

An outback Australian landscape with T-rex dinosaurs being chased by ducklings
An outback Australian landscape with T-rex dinosaurs being chased by ducklings, 3d rendering
two teddy bears doing research in a laboratory in the 80s
The Buddha attaining enlightenment with galaxies entering his mind
The Tesseract from the movie Interstellar, with inverted colours
An AI using a laptop computer to watch YouTube
A big muscular robot cyborg pikachu full of electricity having an arm wrestle with a cyborg Zurg in a shopping aisle, surrounded by shopping carts. Realistic photo.
the most beautiful thing I’ve ever seen
a dragon in the shape of a zebra
a dragon in the shape of a zebra, digital style
the intelligence explosion happening

The good ones here had faces in them so I can’t post them. I like how random this one is.

the end of the world

...is surprisingly calm and beautiful.

the beginning of the world

Boo!

intellectual progress

A pen and some gibberish… is actually a pretty good metaphor for intellectual progress?

an awesome house
a lego spaceship

“A spaceship made of legos” is just more of the same.

an animal that will exist someday
the four elements
the four elements
the periodic table
superhero thor versus mythological thor
Comic book Spider-man vs. movie Spider-man
Before the diamondoid zeppelins clustering in the sky completely blotted out the sun, one came low enough for the message on its flank to be seen: “Tlon, Uqbar, Stlalm Anit”.
The Emperor of the Galaxy, byzantine mosaic
the stuff of dreams
marching modrons

It got the marching part. I guess DE2 hasn’t ever played DnD.

the incredible umber hulk painted by pieter bruegel
A painting of a cat vampire drinking wine by greg rutkowski
Watercolor painting of apples with arms and legs fighting