There isn’t a lot of talk about image models (e.g. Dall-E and StableDiffusion) on LW in the context of alignment, especially compared to LLMs. Why is that? Some hypotheses:
LLMs just happened to get some traction early, and due to network effects, they are the primary research vehicle
LLMs are a larger alignment risk than image models, e.g. the only alignment risk of image generation comes from the language embedding
LLMs are not a larger alignment risk, but they are easier to use for alignment research
Following Scott Aaronson, we might say the answer depend on wether we’re talking reform|orthodox vision of alignement. Adversarial pictures and racial bias are definitely real concerns for automatic vision, then for reform alignement. But many animal species mastered vision, movement, or olfaction better than humans as a species, for hundred of millions years without producing anything that could challenge the competitive advantage of the human language, so I guess for orthodox alignement vision looks much less scary than language model.
I’m curious if those at ease with either orthodox or reform label would corroborate these predictions of their feelings?