I agree there’s great variety and intellectual sophistication in art. My paper argues that the Sensory Optimization model captures *some* (not all) key properties of visual art. The model is simple, easy to experiment with (e.g. generating art-like images), and captures a surprising amount. That said, there are probably simple computational models that could do better and I’d be excited to see concrete proposals.
The paper does touch on some of your concerns. Feature Visualization can generate non-representational images (Section 1.2). I suspect these images could be made more aesthetic and evocative by training on datasets with captions that include human emotional and aesthetic responses (Section 2.3), and the same goes for art that’s strongly rooted in emotions (Section 2.3.3). Do you have examples in mind when you mention “human experience” and “embodiment” and “limited agents”? I don’t really address art where the artist has different knowledge/understanding than the audience and that’s an important topic for further work (Section 2.3.4 is related).
I agree that lots of art (including some painting) is “heavily linguistic, or social, or relies on … thinking on the part of the audience”. Having a computational model that can generate this kind of art is plausibly AGI-complete. Yet (as already noted) it’s likely we can do better than my current model.
(In general, I’m optimistic about what neural nets can create by Sensory Optimization and related techniques. Current neural nets have zero experience of the physical act of painting or drawing. They have no understanding of how animals or humans move and act in the world or of human values or interests. Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I’m curious whether this was surprising to you.)
Regarding your last paragraph, I want to make some clarifications. I don’t express a view about whether Deep Dream makes art. I claim that by combining ideas from Deep Dream and Style Transfer with richer datasets we could create something close to a basic form of human visual art. I don’t claim that the creative process for humans is like optimization by gradient descent. Instead, humans optimize by drawing on their general intelligence (e.g. hierarchical planning, analytical reasoning, etc.).
Sure, you can get the AI to draw polka-dots by targeting a feature that likes polka dots, or a Mondrian by targeting some features that like certain geometries and colors, but now you’re not using style transfer at all—the image is the style. Moreover, it would be pretty hard to use this to get a Kandinsky, because the AI that makes style-paintings has no standard by which it would choose things to draw that could be objects but aren’t. You’d need a third and separate scheme to make Kandinskys, and then I’d just bring up another artist not covered yet.
If you’re not trying to capture all human visual art in one model, then this is no biggie. So now you’re probably going “this is fine, why is he going on about this.” So I’ll stop.
Do you have examples in mind when you mention “human experience” and “embodiment” and “limited agents”
For “human experience,” yeah, I just means something like communicative/evocative content that relies on a theory of mind to use for communication. Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before. I’m also reminded of Karpathy’s example of Obama with his foot on the scale.
For embodiment, this means art that blurs the line between visual and physical. I was thinking of how some things aren’t art if they’re normal sized, but if you make them really big, then they’re art. Since all human art is physical art, this line can be avoided mostly but not completely.
For “limited,” I imagined something like Dennett’s example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them. Compared to the example above of using understanding of humans to choose content, this example uses an understanding of humans to choose style.
Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I’m curious whether this was surprising to you.
It was impressive, but I remember the old 2015 post the Chris Olah co-authored. First off, if you look at the pictures, they’re less pretty than the pictures that came later. And I remember one key sentence: “By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.” My impression is that DeepDream et al. have been trained to make visual art—by hyperparameter tuning (grad student descent).
You’d need a third and separate scheme to make Kandinskys, and then I’d just bring up another artist not covered yet.
Again, replicating all human art is probably AGI-complete. However, there are some promising strategies for generating non-representational art and I’d guess artists were (implicitly) using some of them. Here are some possible Sensory Optimization objectives:
1. Optimize the image to be a superstimulus for random sets of features in earlier layers (this was already discussed).
2. Use Style Transfer to constrain the low-level features in some way. This could aim at grid-like images (Mondrian, Kelly, Albers) or a limited set of textures (Richter). This is mentioned in Section 1.3.1.
3. If you want the image to evoke objects (without explicitly depicting them), then you could combine (1) and (2) with optimizing for some object labels (e.g. river, stairs, pole). This is simpler than your Kandinsky example but could still be effective.
4. In addition to (1) and (2), optimize the image for human emotion labels (having trained on a dataset with emotion labels for photos). To take a simplistic example: people will label photos with lots of green or blue (e.g. forest or sea or blue skies) as peaceful/calming, and so abstract art based on those colors would be labeled similarly. Red or muddy-gray colors would produce a different response. This extends beyond colors to visual textures, shapes, symmetry vs. disorder and so on. (Compare this Rothko to this one).
Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before.
I agree with your general point about the relevance of theory of mind. However, I think Sensory Optimization could generate patriotic paintings without training on them. Suppose you have a dataset that’s richer than ImageNet and includes human emotion and sentiment labels/captions. Some photos will cause patriotic sentiments: e.g. photos of parades or parties on national celebrations, photos of a national sports team winning, photos of iconic buildings or natural wonders. So to create patriotic paintings, you would optimize for labels relating to patriotism. If there are emotional intensity ratings for photos, and patriotic scenes cause high intensity, then maybe you could get patriotic paintings by just optimizing for intensity. (Facebook has trained models on a huge image dataset with Instagram hashtags—some of which relate to patriotic sentiment. Someone could run a version of this experiment today. However, I think it’s a more interesting experiment if the photos are more like everyday human visual perception than carefully crafted/edited photos you’ll find on Instagram.)
I was thinking of how some things aren’t art if they’re normal sized, but if you make them really big, then they’re art.
Again, I expect a richer training set would convey lots of this information. Humans would use different emotional/aesthetic labels on seeing unusually large natural objects (e.g. an abnormally large dog or man, a huge tree or waterfall).
For “limited,” I imagined something like Dennett’s example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them.
Some artworks depend on idiosyncratic quirks of human visual cognition (e.g. optical illusions). It’s probably hard for a neural net to predict how humans will respond to all such works (without training on other images that exploit the same quirk). This will limit the kind of art the Sensory Optimization model can generate. Still, this doesn’t undermine my claim that artists are doing something like Sensory Optimization. For example, humans have a bias towards seeing faces in random objects—pareidolia. By exploiting this, artists exploit an image that looks like two things at once. (The artist knows the illusion will work, because it works on his or her own visual system).
My impression is that DeepDream et al. have been trained to make visual art—by hyperparameter tuning (grad student descent).
I think this first blogpost on Deep Dream and the original paper on Style Transfer already were already very impressive. The regularization tweak for Deep Dream is very simple and quite different from what I mean by “training on visual art”. (It’s less surprising that a GAN trained on visual art can generate something that looks like visual art—although it is surprising how well they can deal with stylized images.)
I agree there’s great variety and intellectual sophistication in art. My paper argues that the Sensory Optimization model captures *some* (not all) key properties of visual art. The model is simple, easy to experiment with (e.g. generating art-like images), and captures a surprising amount. That said, there are probably simple computational models that could do better and I’d be excited to see concrete proposals.
The paper does touch on some of your concerns. Feature Visualization can generate non-representational images (Section 1.2). I suspect these images could be made more aesthetic and evocative by training on datasets with captions that include human emotional and aesthetic responses (Section 2.3), and the same goes for art that’s strongly rooted in emotions (Section 2.3.3). Do you have examples in mind when you mention “human experience” and “embodiment” and “limited agents”? I don’t really address art where the artist has different knowledge/understanding than the audience and that’s an important topic for further work (Section 2.3.4 is related).
I agree that lots of art (including some painting) is “heavily linguistic, or social, or relies on … thinking on the part of the audience”. Having a computational model that can generate this kind of art is plausibly AGI-complete. Yet (as already noted) it’s likely we can do better than my current model.
(In general, I’m optimistic about what neural nets can create by Sensory Optimization and related techniques. Current neural nets have zero experience of the physical act of painting or drawing. They have no understanding of how animals or humans move and act in the world or of human values or interests. Yet even with zero prior training on visual art they can make pretty impressive images by human lights. I think this was surprising to most people both in and outside deep learning. I’m curious whether this was surprising to you.)
Regarding your last paragraph, I want to make some clarifications. I don’t express a view about whether Deep Dream makes art. I claim that by combining ideas from Deep Dream and Style Transfer with richer datasets we could create something close to a basic form of human visual art. I don’t claim that the creative process for humans is like optimization by gradient descent. Instead, humans optimize by drawing on their general intelligence (e.g. hierarchical planning, analytical reasoning, etc.).
Thanks for the reply :)
Sure, you can get the AI to draw polka-dots by targeting a feature that likes polka dots, or a Mondrian by targeting some features that like certain geometries and colors, but now you’re not using style transfer at all—the image is the style. Moreover, it would be pretty hard to use this to get a Kandinsky, because the AI that makes style-paintings has no standard by which it would choose things to draw that could be objects but aren’t. You’d need a third and separate scheme to make Kandinskys, and then I’d just bring up another artist not covered yet.
If you’re not trying to capture all human visual art in one model, then this is no biggie. So now you’re probably going “this is fine, why is he going on about this.” So I’ll stop.
For “human experience,” yeah, I just means something like communicative/evocative content that relies on a theory of mind to use for communication. Maybe you could train an AI on patriotic paintings and then it could produce patriotic paintings, but I think only by working on theory of mind would an AI think to produce a patriotic painting without having seen one before. I’m also reminded of Karpathy’s example of Obama with his foot on the scale.
For embodiment, this means art that blurs the line between visual and physical. I was thinking of how some things aren’t art if they’re normal sized, but if you make them really big, then they’re art. Since all human art is physical art, this line can be avoided mostly but not completely.
For “limited,” I imagined something like Dennett’s example of the people on the bridge. The artist only has to paint little blobs, because they know how humans will interpret them. Compared to the example above of using understanding of humans to choose content, this example uses an understanding of humans to choose style.
It was impressive, but I remember the old 2015 post the Chris Olah co-authored. First off, if you look at the pictures, they’re less pretty than the pictures that came later. And I remember one key sentence: “By itself, that doesn’t work very well, but it does if we impose a prior constraint that the image should have similar statistics to natural images, such as neighboring pixels needing to be correlated.” My impression is that DeepDream et al. have been trained to make visual art—by hyperparameter tuning (grad student descent).
Again, replicating all human art is probably AGI-complete. However, there are some promising strategies for generating non-representational art and I’d guess artists were (implicitly) using some of them. Here are some possible Sensory Optimization objectives:
1. Optimize the image to be a superstimulus for random sets of features in earlier layers (this was already discussed).
2. Use Style Transfer to constrain the low-level features in some way. This could aim at grid-like images (Mondrian, Kelly, Albers) or a limited set of textures (Richter). This is mentioned in Section 1.3.1.
3. If you want the image to evoke objects (without explicitly depicting them), then you could combine (1) and (2) with optimizing for some object labels (e.g. river, stairs, pole). This is simpler than your Kandinsky example but could still be effective.
4. In addition to (1) and (2), optimize the image for human emotion labels (having trained on a dataset with emotion labels for photos). To take a simplistic example: people will label photos with lots of green or blue (e.g. forest or sea or blue skies) as peaceful/calming, and so abstract art based on those colors would be labeled similarly. Red or muddy-gray colors would produce a different response. This extends beyond colors to visual textures, shapes, symmetry vs. disorder and so on. (Compare this Rothko to this one).
I agree with your general point about the relevance of theory of mind. However, I think Sensory Optimization could generate patriotic paintings without training on them. Suppose you have a dataset that’s richer than ImageNet and includes human emotion and sentiment labels/captions. Some photos will cause patriotic sentiments: e.g. photos of parades or parties on national celebrations, photos of a national sports team winning, photos of iconic buildings or natural wonders. So to create patriotic paintings, you would optimize for labels relating to patriotism. If there are emotional intensity ratings for photos, and patriotic scenes cause high intensity, then maybe you could get patriotic paintings by just optimizing for intensity. (Facebook has trained models on a huge image dataset with Instagram hashtags—some of which relate to patriotic sentiment. Someone could run a version of this experiment today. However, I think it’s a more interesting experiment if the photos are more like everyday human visual perception than carefully crafted/edited photos you’ll find on Instagram.)
Again, I expect a richer training set would convey lots of this information. Humans would use different emotional/aesthetic labels on seeing unusually large natural objects (e.g. an abnormally large dog or man, a huge tree or waterfall).
Some artworks depend on idiosyncratic quirks of human visual cognition (e.g. optical illusions). It’s probably hard for a neural net to predict how humans will respond to all such works (without training on other images that exploit the same quirk). This will limit the kind of art the Sensory Optimization model can generate. Still, this doesn’t undermine my claim that artists are doing something like Sensory Optimization. For example, humans have a bias towards seeing faces in random objects—pareidolia. By exploiting this, artists exploit an image that looks like two things at once. (The artist knows the illusion will work, because it works on his or her own visual system).
I think this first blogpost on Deep Dream and the original paper on Style Transfer already were already very impressive. The regularization tweak for Deep Dream is very simple and quite different from what I mean by “training on visual art”. (It’s less surprising that a GAN trained on visual art can generate something that looks like visual art—although it is surprising how well they can deal with stylized images.)