I think those are very creative ideas, and I think asking for “non-obvious” things in pictures is a good approach, since basically all really intelligent models are language models, some sort of “image reasoning” might work.
I tried the socket with the clip model, and the clip model got the feeling correct very confidently:
I myself can’t see who the person in the bread is supposed to be, so I think an AI would struggle with it too. But on the other hand I think it shouldn’t be too difficult to train a face identification AI to identify people in bread (or hidden in other ways), assuming the developer could create a training dataset from solving some captchas himself.
I’m thinking if it’s possible to pose long reasonging problems in an image. Like: Next to the roundest object in the picture, there is a dark object, what other object in the picture is most similar in shape?
I like that direction, but I fear it’ll fail the “90% of internet users” criterion. I also suspect that simple image matching will find similar-enough photos with captions that have the answer.
How about somehow utilizing pareidolia? Something like asking how this socket is feeling?
Or who’s in this picture?
I think those are very creative ideas, and I think asking for “non-obvious” things in pictures is a good approach, since basically all really intelligent models are language models, some sort of “image reasoning” might work.
I tried the socket with the clip model, and the clip model got the feeling correct very confidently:
I myself can’t see who the person in the bread is supposed to be, so I think an AI would struggle with it too. But on the other hand I think it shouldn’t be too difficult to train a face identification AI to identify people in bread (or hidden in other ways), assuming the developer could create a training dataset from solving some captchas himself.
I’m thinking if it’s possible to pose long reasonging problems in an image. Like: Next to the roundest object in the picture, there is a dark object, what other object in the picture is most similar in shape?
I like that direction, but I fear it’ll fail the “90% of internet users” criterion. I also suspect that simple image matching will find similar-enough photos with captions that have the answer.