gwern comments on Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more

gwern 19 Sep 2024 14:48 UTC
8 points
2

Unless they’re unable to load images or are using a text-based web browser, in which case the descriptions are helpful—another curb cut!

One curb cut example that has come up lately is that image captions/descriptions, particularly alt text, may wind up being a curb cut for non-blind people. While a normal person doesn’t benefit at all from an alt text under pretty much any circumstance where the alt text goes beyond a normal image caption (if a caption is necessary at all), what they’ve been really useful for is training AI.

All the stuff like CLIP or Stable Diffusion or GPT-4-v/o all depend heavily on the alt text as a large fraction of their text data. Given how little use they are normally, most of that alt text would not exist except for accessibility impulses.

And of course, those systems then benefit normal people in a myriad of ways—everywhere that any kind of image or video is ever processed or used in the future. Even if you yourself are not creating generative images or analyzing images or using improved OCR or uploading images to ChatGPT/Claude for some task, you’re still indirectly benefiting from better Youtube search powered by video embeddings etc.

And it is a virtuous loop because now you can use those systems to make alt texts much cheaper. I’ve experimented a bit with using GPT-4-V to caption images on Gwern.net lacking alt descriptions (doing it iteratively to ensure it catches details), and it works pretty well, and while I personally do not directly benefit much from better alt text systems directly, I do benefit indirectly for similar reasons—by encoding images into alt-texts, those alt-texts would make my website corpus more usable with other systems: now there’s text to be grepped, embedded, read by LLMs, etc. (And this would help other systems I might like to make, like my Utext document format would benefit from powerful image<->text AI capabilities to allow things like cheap usecase-customized ASCII art.)