DirectedEvolution comments on Making DALL-E Count

DirectedEvolution 22 Jul 2022 19:02 UTC
15 points
1
I started by replicating my experiments using “The number [digit]” from 0-10 and including 100. Interestingly, DALL-E is 100% accurate until 100, when it throws in an extra zero on one of the images.
What happens if we start doing less common two-digit numbers, like 41, 66, and 87?
DALL-E seems to like duplicating individual digits. I’d guess that this is because all numbers from 60-69 contain at least one 6, so it’s weighted heavily toward having any given digit in images containing “the number [6X]” be a 6.
What if we generate a few more non-duplicate double digits, like 23, 37, and 90?
DALL-E was about 40% accurate here, though if we include the non-duplicate rows above as well, its overall accuracy is 30% in generating double-digit numbers.
It’s interesting to me that DALL-E pretty consistently gets the right first digit in the duplicate numbers, but fails on the second digit. Does that pattern continue into three-digit numbers? Let’s try 147, 598, and 953.
Sort of. DALL-E probably sees three-digit numbers relatively rarely, and these particular three-digit numbers almost never. My guess is that its greater success with 147 is because it contains a more common number-pair (14), which is why all its guesses are composed of those two digits. In the latter two samples, it seems to be riffing on the underlying visual similarity of 9, 5, and 6.
So it seems that DALL-E can “count” if we prompt it with “the number X” as long as X is sufficiently common in its training data to “crystallize it” if you will as an entity of its own, having a distinct identity from other similar shapes. But if we feed DALL-E prompts containing uncommon numbers, it’s biased toward low digits (because those are common) and 5-6-9 (because those are visually similar).