JenniferRM comments on What DALL-E 2 can and cannot do

JenniferRM 7 May 2022 21:49 UTC
22 points
I think DALL-E has been nerfed (as a sort of low-grade “alignment” effort) and some of what you’re talking about as “limitations” are actually bugs that were explicitly introduced with the goal of avoiding bad press.
OpenAI has made efforts to implement model-level technical mitigations that ensure that DALL·E 2 Preview cannot be used to directly generate exact matches for any of the images in its training data. However, the models may still be able to compose aspects of real images and identifiable details of people, such as clothing and backgrounds. (sauce)
It wouldn’t surprise me if they just used intelligibility tools to find the part of the vectorspace that represents “the face of any famous real person” and then applied some sort of noise blur to the model itself, as deployed?
Except! Maybe not a “blur” but some sort of rotation of a subspace or something? This hint is weirdly evocative:
they were very recognizably screenshots from Firefly in terms of lighting, ambiance, scenery etc, with an actor who looked almost like Nathan Fillion – as though cast in a remake that was trying to get it fairly similar – and who looked consistently the same across all 10 images, but was definitely a different person.
The alternative of having it ever ever ever produce a picture of “Obama wearing <embarassing_thing>” or “Trump sleeping in box splashed with bright red syrup” or some such… that stuff might go viral… badly...
...so any single thing anyone manages to make has to be pre-emptively and comprehensively nerfed in general?
By comparison, it costs almost nothing to have people complain about how it did some totally bizarre other thing while refusing to shorten the hair of someone who might look too much like “Alan Rickman playing Snape” such that you might see a distinctive earlobe.
Sort of interestingly: in a sense, this damage-to-the-model is a (super low tech) “alignment” strategy!
The thing they might have wanted was to just say “while avoiding any possible embarrassing image (that could be attributed to the company that made the model making the image) in the circa-2022 political/PR meta… <user provided prompt content”.
But the system isn’t a genie that can understand the spirit and intent of wishes (yet?) so instead… just reach into the numbers and scramble them in certain ways?
In this sense, I think we aren’t seeing “DALL-E 2 as trained” but rather “DALL-E 2 with some sort of interesting alignment-related lobotomy to make it less able to accidentally stir up trouble”.
- gwern 7 May 2022 23:34 UTC
  7 points
  Parent
  
  In this sense, I think we aren’t seeing “DALL-E 2 as trained” but rather “DALL-E 2 with some sort of interesting alignment-related lobotomy to make it less able to accidentally stir up trouble”.
  
  Yes, I thought their ‘horse in ketchup’ example made the point well that it’s an ‘artificial stupidity’ Harrison-Bergeron sort of approach rather than a genuine solution. (And then, like BPEs, there seems to be unpredictable fallout which would be hard to benchmark and which no one apparently even thought to benchmark—despite whatever they did on May 1st to upgrade quality, the anime examples still struggle to portray specific characters like Kyuubey, where Swimmer’s examples are all very Kyuubey-esque but never actually Kyuubey. I am told the CLIP used is less degraded, and so we’re probably seeing the output of ‘CLIP models which know about characters like Kyuubey combined with other models which have no idea’.)