Dall-E knows locations. We put a watercolor painting I did of our cabin on a lake and asked Dall-E to create a “variation”. The watercolor image Dall-e created was literally my next door neighbors cabin which is a few hundred feet away from ours. Blew my mind how Dall-E even knew the location just based on the image I put in.
I agree. What sort of images would it even be trained on in the first place which would allow that? It can’t train on a big montage or landscape shot because the dimensions are wrong and the core model is trained on very small samples to boot, with upscalers handling most of the pixel generation. I would check Google & Yandex image search to see if there are any photographs online with the two cabins in the same photograph which could hypothetically enable that. I would also try using the closest street addresses to see if one can prompt it directly, since that is likely what would be in the text caption of hypothetical images. Also, testing photograph rather than watercolor is an obvious change. A more stringent test would be to do inpainting/uncropping of photographs of both: if it really does ‘know’, it should be highly likely to fill in the other cabin in the right location and surroundings when you ‘pan left’ or whatever. Otherwise, ‘cabins’ are a fairly stereotypical kind of architecture and it just got lucky. OA says DALL-E 2 is well into the low millions of images generated and climbing as fast as overloaded GPUs can spit them out (<=50 completions per day per >30k invited people thus far...), so we’re not even appealing that hard to chance here.
Dall-E knows locations. We put a watercolor painting I did of our cabin on a lake and asked Dall-E to create a “variation”. The watercolor image Dall-e created was literally my next door neighbors cabin which is a few hundred feet away from ours. Blew my mind how Dall-E even knew the location just based on the image I put in.
I currently roll to disbelieve, and suspect that it just thought a cabin should be there.
What I’m saying is, pics or it didn’t happen ;)
I agree. What sort of images would it even be trained on in the first place which would allow that? It can’t train on a big montage or landscape shot because the dimensions are wrong and the core model is trained on very small samples to boot, with upscalers handling most of the pixel generation. I would check Google & Yandex image search to see if there are any photographs online with the two cabins in the same photograph which could hypothetically enable that. I would also try using the closest street addresses to see if one can prompt it directly, since that is likely what would be in the text caption of hypothetical images. Also, testing photograph rather than watercolor is an obvious change. A more stringent test would be to do inpainting/uncropping of photographs of both: if it really does ‘know’, it should be highly likely to fill in the other cabin in the right location and surroundings when you ‘pan left’ or whatever. Otherwise, ‘cabins’ are a fairly stereotypical kind of architecture and it just got lucky. OA says DALL-E 2 is well into the low millions of images generated and climbing as fast as overloaded GPUs can spit them out (<=50 completions per day per >30k invited people thus far...), so we’re not even appealing that hard to chance here.
W h a t that’s wild, wow, I would absolutely not have predicted DALL-E could do that! (I’m curious whether it replicates in other instances.)