Checking in on Scott's composition image bet with imagen 3

2.5 years ago Scott Alexander made a bet that by June of 2025, image gen should have more or less solved compositionality, operationalized through 5 prompts, must get at least 3 correct. There was a premature declaration of victory, but if the bet was settled I hadn’t heard about it.

It’s time. Google’s Imagen 3 gets ⁴⁄₅. The bet specifies 10 shots per prompt, but I’m just going to put the four it generates since that’s plenty.

1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

This is the only one that Imagen doesn’t get. It makes multiple mistakes in the composition. It’s a bit ironic that this is the one it missed given that the whole genesis of the bet was about designing stained glass.

2. An oil painting of a man in a factory looking at a cat wearing a top hat

Purrfect. I wonder what filter tripped to block that fourth one, this seems like a pretty innocuous prompt to me.

3. A digital art picture of a child riding a llama with a bell on its tail through a desert

3 out of 4 ain’t bad. Also I like how well it handles shadows.

4. A 3D render of an astronaut in space holding a fox wearing lipstick

3d renders are so good now I’m not sure how the 4th image would be different if it were photorealistic.

5. Pixel art of a farmer in a cathedral holding a red basketball

Again with the filter, but otherwise perfect.

Edwin Chen at Surge seems to be the official judge, and he’s a very strict grader, so maybe there’s some risk the basketball isn’t red enough of whatever. But this all seems fairly convincing to me.

Addendum: I was curious if Sora, OpenAI’s video gen AI, could handle the raven/key stained glass prompt. Answer: nope, but at least it tried!

Checking in on Scott’s composition image bet with imagen 3