Okay, sure, I kind of buy it. Generated images are closer to each other than to the nearest image in the training set. And the denoisers learn similar heuristics like “do averaging” and “there’s probably a face in the middle of the image.”
I still don’t really feel excited, but maybe that’s me and not the paper.
Haven’t read in detail but Fig. 2 seems to me to support the exciting claim (also because overparameterized models with 70k trainable parameters)?
Okay, sure, I kind of buy it. Generated images are closer to each other than to the nearest image in the training set. And the denoisers learn similar heuristics like “do averaging” and “there’s probably a face in the middle of the image.”
I still don’t really feel excited, but maybe that’s me and not the paper.