gturk1 comments on What DALL-E 2 can and cannot do

gturk1 6 May 2022 2:04 UTC
2 points
0
Thank you for sharing all of these DALL-E tests!
I wonder whether it can reproduce three objects that reliably appear together in images. How about one of these prompts:
A bronze statue of three wise monkeys.
See no evil, hear no evil, speak no evil, statue of monkeys.
- Swimmer963 (Miranda Dixon-Luinenburg) 6 May 2022 17:52 UTC
  3 points
  0
  Parent
  “A bronze statue of three wise monkeys.” Pretty solid!
  “See no evil, hear no evil, speak no evil, statue of monkeys.”
  - PoignardAzur 16 May 2022 9:24 UTC
    1 point
    0
    Parent
    Interesting. It seems to understand that the pattern should be “Three monkeys with hands on their heads somehow”, but it doesn’t seem to get that each monkey should have hands in a different position.
    I wonder if that means gwern is wrong when he says DALL-E 2′s problem is that the text model compresses information, and the underlying “representation” model genuinely struggles with composition and “there must be three X with only a single Y among them” type of constraints.
  - gturk1 8 May 2022 2:06 UTC
    1 point
    0
    Parent
    Thank you so much for this! It did do quite well.
    I have been trying to think of another set of three items that are reliably found together, but this is all I could come up with. Pairs of items are much easier to come up with.
  - TibuAI 6 May 2022 23:41 UTC
    1 point
    0
    Parent
    This is so good.