This is interesting, because you’d think it would at least understand that the cup should be tipping over. Makes me think it is considering the cup and the water as two distinct objects, and doesn’t really understand that the cup tipping over would be what causes the water to spill. But it does understand that the water should be located “inside” the cup, but probably purely in a “it looks like the water is inside the cup” sense. I don’t think DALL-E seems to understand the idea of “inside” as an actual location.
I wonder if its understanding of the world is just 2D or semi-3D. Perhaps training it on photogrammetry datasets (photos of the same objects but from multiple points of view) would improve that?
“Pages from a flip book of a water glass spilling” I...think DALL-E 2 does not know what a flip book is.
I...think it just does not understand the physics of water spilling, period.
Relatedly, DALL-E is a little confused about how Olympic swimming is supposed to work.
This is interesting, because you’d think it would at least understand that the cup should be tipping over. Makes me think it is considering the cup and the water as two distinct objects, and doesn’t really understand that the cup tipping over would be what causes the water to spill. But it does understand that the water should be located “inside” the cup, but probably purely in a “it looks like the water is inside the cup” sense. I don’t think DALL-E seems to understand the idea of “inside” as an actual location.
I wonder if its understanding of the world is just 2D or semi-3D. Perhaps training it on photogrammetry datasets (photos of the same objects but from multiple points of view) would improve that?