Interestingly, the text to speech conversion of the “Text does not equal text” section is another very concrete example of this:
The TTS AI summarizes the “Hi!” ASCII art picture as “Vertical lines arranged in a grid with minor variations”. I deliberately added an alt text to that image, describing what can be seen, and I expected that this alt text would be used for TTS—but seemingly that is not the case, and instead some AI describes the image in isolation. If I were to describe that image without any further context, I would probably mention that it says “Hi!”, but I grant that describing it as “Vertical lines arranged in a grid with minor variations” would also be a fair description.
the “| | | |↵|-| | |↵| | | o” string is read out as “dash O”. I would have expected the AI to just read that out in full, character by character. Which probably is an example of me falsely taking my intention as a given. There are probably many conceivable cases where it’s actually better for the AI to not read out cryptic strings character by character (e.g. when your text contains some hash or very long URL). So maybe it can’t really know that this particular case is an exception.
FWIW, I, ostensibly a human, did not see the “Hi!” either and instead saw something like one of those dangling-string curtains with an orb on one (maybe a control pull) and a little crossbar (maybe it’s in front of a window).
Interestingly, the text to speech conversion of the “Text does not equal text” section is another very concrete example of this:
The TTS AI summarizes the “Hi!” ASCII art picture as “Vertical lines arranged in a grid with minor variations”. I deliberately added an alt text to that image, describing what can be seen, and I expected that this alt text would be used for TTS—but seemingly that is not the case, and instead some AI describes the image in isolation. If I were to describe that image without any further context, I would probably mention that it says “Hi!”, but I grant that describing it as “Vertical lines arranged in a grid with minor variations” would also be a fair description.
the “| | | |↵|-| | |↵| | | o” string is read out as “dash O”. I would have expected the AI to just read that out in full, character by character. Which probably is an example of me falsely taking my intention as a given. There are probably many conceivable cases where it’s actually better for the AI to not read out cryptic strings character by character (e.g. when your text contains some hash or very long URL). So maybe it can’t really know that this particular case is an exception.
FWIW, I, ostensibly a human, did not see the “Hi!” either and instead saw something like one of those dangling-string curtains with an orb on one (maybe a control pull) and a little crossbar (maybe it’s in front of a window).