PoignardAzur comments on What DALL-E 2 can and cannot do

PoignardAzur 16 May 2022 9:24 UTC
1 point
Interesting. It seems to understand that the pattern should be “Three monkeys with hands on their heads somehow”, but it doesn’t seem to get that each monkey should have hands in a different position.
I wonder if that means gwern is wrong when he says DALL-E 2′s problem is that the text model compresses information, and the underlying “representation” model genuinely struggles with composition and “there must be three X with only a single Y among them” type of constraints.