2.5 years ago Scott Alexander made a bet that by June of 2025, image gen should have more or less solved compositionality, operationalized through 5 prompts, must get at least 3 correct. There was a premature declaration of victory, but if the bet was settled I hadn’t heard about it.
It’s time. Google’s Imagen 3 gets 4⁄5. The bet specifies 10 shots per prompt, but I’m just going to put the four it generates since that’s plenty.
1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth
This is the only one that Imagen doesn’t get. It makes multiple mistakes in the composition. It’s a bit ironic that this is the one it missed given that the whole genesis of the bet was about designing stained glass.
2. An oil painting of a man in a factory looking at a cat wearing a top hat
Purrfect. I wonder what filter tripped to block that fourth one, this seems like a pretty innocuous prompt to me.
3. A digital art picture of a child riding a llama with a bell on its tail through a desert
3 out of 4 ain’t bad. Also I like how well it handles shadows.
4. A 3D render of an astronaut in space holding a fox wearing lipstick
3d renders are so good now I’m not sure how the 4th image would be different if it were photorealistic.
5. Pixel art of a farmer in a cathedral holding a red basketball
Again with the filter, but otherwise perfect.
Edwin Chen at Surge seems to be the official judge, and he’s a very strict grader, so maybe there’s some risk the basketball isn’t red enough of whatever. But this all seems fairly convincing to me.
Addendum: I was curious if Sora, OpenAI’s video gen AI, could handle the raven/key stained glass prompt. Answer: nope, but at least it tried!
Checking in on Scott’s composition image bet with imagen 3
2.5 years ago Scott Alexander made a bet that by June of 2025, image gen should have more or less solved compositionality, operationalized through 5 prompts, must get at least 3 correct. There was a premature declaration of victory, but if the bet was settled I hadn’t heard about it.
It’s time. Google’s Imagen 3 gets 4⁄5. The bet specifies 10 shots per prompt, but I’m just going to put the four it generates since that’s plenty.
1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth
This is the only one that Imagen doesn’t get. It makes multiple mistakes in the composition. It’s a bit ironic that this is the one it missed given that the whole genesis of the bet was about designing stained glass.
2. An oil painting of a man in a factory looking at a cat wearing a top hat
Purrfect. I wonder what filter tripped to block that fourth one, this seems like a pretty innocuous prompt to me.
3. A digital art picture of a child riding a llama with a bell on its tail through a desert
3 out of 4 ain’t bad. Also I like how well it handles shadows.
4. A 3D render of an astronaut in space holding a fox wearing lipstick
3d renders are so good now I’m not sure how the 4th image would be different if it were photorealistic.
5. Pixel art of a farmer in a cathedral holding a red basketball
Again with the filter, but otherwise perfect.
Edwin Chen at Surge seems to be the official judge, and he’s a very strict grader, so maybe there’s some risk the basketball isn’t red enough of whatever. But this all seems fairly convincing to me.
Addendum: I was curious if Sora, OpenAI’s video gen AI, could handle the raven/key stained glass prompt. Answer: nope, but at least it tried!