Vaniver comments on Neural nets as a model for how humans make and understand visual art

Vaniver 12 Nov 2019 3:38 UTC
13 points
There’s a long history of development of artistic concepts; for example, the discovery of perspective. There’s also lots of artistic concepts where the dependence on the medium is highly significant, of which the first examples that come to mind are cave paintings (which were likely designed around being viewed by moving firelight) and ‘highly distorted’ ancient statuettes that are probably self-portraits.
So it seems to me like we should expect there to be a similar medium-relevance for NN-generated artwork, and a similar question of which artistic concepts it possesses and lacks. Perspective, for example, seems pretty well captured by GANs targeting the LSUN bedroom dataset or related tasks. It seems like it’s not a surprise that NNs would be good at perspective compared to humans, since there’s a cleaner inverse between the perceptive and the creation of perspective from the GAN’s point of view than the human’s (who has to use their hands to make it, rather than their inverted eyes).
That is, it seems that the similarities are actually pretty constrained to bits where the medium means humans and NN operate similarly, and the path of human art generation and the path of NN art generation look quite different in a way that suggests there are core differences. For example, I think most humans have pretty good facility with creating and understanding ‘stick figures’ that comes from training on a history of communicating with other humans using stick figures, rather than simply generalizing from visual image recognition, and we might be able to demonstrate the differences through a training / finetuning scheme on some NN that can do both classification and generation.
We might want to look for find concepts that are easier for humans than NNs; when I talk to people about ML-produced music, they often suggest that it’s hard to capture the sort of dependencies that make for good music using current models (in the same way that current models have trouble making ‘good art’ that’s more than style transfer or realistic faces or so on; it’s unlikely that we could hook a NN up to a DeviantArt account and accept commissions and make money). But as someone who can listen to lots of Gandalf nodding to Jazz, and who thinks there are applications for things like CoverBot (which would do the acoustic equivalent of style transfer), my guess is that the near-term potential for ML-produced music is actually quite high.
- Owain_Evans 12 Nov 2019 13:48 UTC
  11 points
  Parent
  There’s also lots of artistic concepts where the dependence on the medium is highly significant
  Great examples. I agree the physical medium is really important in human art: see my Section 1.3.1.
  It seems like it’s not a surprise that NNs would be good at perspective compared to humans, since there’s a cleaner inverse between the perceptive and the creation of perspective from the GAN’s point of view than the human’s (who has to use their hands to make it, rather than their inverted eyes).
  I like the point about hands vs. “inverted eyes”. At the same time, the GANs are trained on a huge number of photos, and these photos exhibit a perfect projection of a 3D scene onto a finite-size 2D array. The GAN’s goal is to match these photos, not to match 3D scenes (which it doesn’t know anything about). Humans invented perspective before having photos to work with. (They did have mirrors and primitive projection techniques.)
  I think most humans have pretty good facility with creating and understanding ‘stick figures’ that comes from training on a history of communicating with other humans using stick figures, rather than simply generalizing from visual image recognition,
  I agree that our facility with stick figures probably depends partly on the history of using stick figures. However, I think our general visual recognition abilities make us very flexible. For example, people can quickly master new styles of abstract depiction that differ from the XKCD style (say in a comic or a set of artworks). DeepMind has a cool recent paper where they learn abstract styles of depiction with no human imitation or labeling.
  We might want to look for find concepts that are easier for humans than NNs; when I talk to people about ML-produced music, they often suggest that it’s hard to capture the sort of dependencies that make for good music using current models (in the same way that current models have trouble making ‘good art’ that’s more than style transfer or realistic faces or so on; it’s unlikely that we could hook a NN up to a DeviantArt account and accept commissions and make money).
  Currently humans play a major role in the interesting examples of neural art. Getting more artist-like autonomy is probably AI-complete, but I can imagine neural nets being more and more widely used in both visual art and music. I agree there’s great potential in neural music! (I suggest some experiments in my conclusion but there’s tons more that could be tried).
  - Vaniver 12 Nov 2019 23:36 UTC
    2 points
    Parent
    
    The GAN’s goal is to match these photos, not to match 3D scenes (which it doesn’t know anything about).
    
    I’ve see some results here where I thought the consensus interpretation was “angle as latent feature”, such that there was an implied 3D scene in the latent space. (Most of what I’m seeing now with a brief scan has to do with facial rotations and pose invariance.) Maybe I should put scene is scare quotes, because it’s generally not fully generic, as the sorts of faces and rooms you find in such a database are highly nonrandom / have a bunch of basic structure you can assume will be there.