Martin Vlach comments on Martin Vlach’s Shortform

Martin Vlach 1 Sep 2022 12:37 UTC
1 point
0
Q: Did anyone train an AI on video sequences where associated caption (descriptive, mostly) is given or generated from another system so that consequently, when the new system gets capable of:
+ describe a given scene accurately
+ predict movements with both visual and/or textual form/representation
+ evaluate questions concerning the material/visible world, e.g. Does a fridge have wheels? Which animals do we most likely to see on a flower?
?