The trippy shuggorth title image was mysterious when it was originally posted, basically someone leaked an image a little before the inceptionism blog post.
A CNN is a reasonable model for fast feedforward vision. We can isolate this pathway for biological vision by using rapid serial presentation—basically flashing an image for 100ms or so.
So imagine if you just saw a flash of one of these images, for a brief moment, and then you had to quickly press a button for the image category—no time to think about it—it’s jeopardy style instant response.
There is no button for “noisy image”, there is no button for “wavy line image”, etc.
Now the fooling images are generated by an adversarial process. It’s like we have a copy of a particular mind in a VR sim, we flash it an image, see what button it presses. Based on the response, we then generate a new image and unwind time and repeat. We keep doing this until we get some wierd classification errors. It allows us to explore the decision space of the agent.
It is basically reverse engineering. It requires a copy of the agent’s code or at least access to a copy with the ability to do tons of queries, and it also probably depends on the agent being completely deterministic. I think that biological minds avoid this issue indirectly because they use stochastic sampling based on secure hardware/analog noise generators.
Stochastic models/ANNs could probably avoid this issue.
I look at the bizarre false positives and I wonder if (warning: wild speculation) the problem is that the networks were not trained to recognize the lack of objects. For example, in most cases you have some noise in the image, so if every training image is something, or rather something-plus-noise, then the system could learn that the noise is 100% irrelevant and pick out the something.
(The noisy images look to me like they have small patches in one spot faintly resembling what they’re identified as — if my vision had a rule that deemphasized the non-matching noise and I had a much smaller database of the world than I do, then I think I’d agree with those neural networks.)
If the above theory is true, then a possible fix would be to include in training data a variety of images for which the expected answers are like “empty scene”, “too noisy”, “simple geometric pattern”, etc. But maybe this is already done — I’m not familiar with the field.
No, even if you classify these false positives as “no image”, this will not prevent someone from constructing new false positives.
Basically the amount of training data is always extremely small compared to the theoretically possible number of distinct images, so it is always possible to construct such adversarial positives. These are not random images which were accidentally misidentified in this way. They have been very carefully designed based on the current data set.
Something similar is probably theoretically possible with human vision recognition as well. The only difference would be that we would be inclined to say “but it really does look like a baseball!”
This technique exploits the fact that the CNN is completely deterministic—see my reply above. It may be very difficult for stochastic networks.
CNNs are comparable to the first 150ms or so of human vision, before feedback , multiple saccades, and higher order mental programs kicks in. So the difficulty in generating these fooling images also depends on the complexity of the inference—a more complex AGI with human-like vision given larger amounts of time to solve the task would probably also be harder to fool, independent of the stochasticity issue.
A human being would be capable of pointing out why something looks like a baseball—to be able to point out where the curves and lines are that provoke that idea. We do this when we gaze at clouds without coming to believe there really are giant kettles floating around; we’re capable of taking the abundance of contextual information in the scene into account and coming up with reasonable hypotheses for why what we’re seeing looks like x, y or z. If classifier vision systems had the same ability they probably wouldn’t make the egregious mistakes they do.
If I understand correctly how these images are constructed, it would be something like this: take some random image. The program can already make some estimate of whether it is a baseball, say 0.01% or whatever. Then you go through the image pixel by pixel and ask, “If I make this pixel slightly brighter, will your estimate go up? if not, will it go up if I make it slightly dimmer?” (This is just an example, you could change the color or whatever as well.) Thus you modify each pixel such that you increase the program’s estimate that it is a baseball. By the time you have gone through all the pixels, the probability of being a baseball is very high. But to us, the image looks more or less just the way it did at first. Each pixel has been modified too slightly to be noticed by us.
But this means that in principle the program can indeed explain why it looks like a baseball—it is a question of a very slight tendency in each pixel in the entire image.
this means that in principle the program can indeed explain why it looks like a baseball—it is a question of a very slight tendency in each pixel in the entire image.
But the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the “explanation” for why the data seems to contain a baseball image will be completely different. Human beings on the other hand can look at pictures of the same object in different conditions of lighting, of different particular sizes and shapes, taken from different camera angles, etc. and still come up with what would be basically the same set of justifications for matching each image to a particular classification (e.g. an image contains a roughly spherical field of white, with parallel bands of stitch-like markings bisecting it in an arc...hence it’s of a baseball).
The ability of human beings to come up with such compressed explanations, and our ability to arrange them into an ordering, is arguably what allows us to deal with iconic representations of and represent objects at varying levels of detail (as in http://38.media.tumblr.com/tumblr_m7z4k1rAw51rou7e0.png).
But the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the “explanation” for why the data seems to contain a baseball image will be completely different
Will it?
What if slightly twiddling the RGB values produces something that is basically “spherical field of white, etc. with enough noise on top of it that humans can’t see it”?
What if slightly twiddling the RGB values produces something that is basically “spherical field of white, etc. with enough noise on top of it that humans can’t see it”?
That would all hinge on what it means for an image to be “hidden” beneath noise, I suppose. The more noise you layer on top of an image the more room for interpretation there is in classifying it, and the less salient any particular classification candidate will be. If a scrutable system can come up with compelling arguments for a strange classification that human beings would not make, then its choices would be naturally less ridiculous than otherwise. But to say that “humans conceivably may suffer from the same problem” is a bit of a dodge; esp. in light of the fact that these systems are making mistakes we clearly would not.
But either way, what you’re proposing and what Unknowns was arguing are different. Unknowns was (if I understood him rightly) arguing that the assignment of different probability weights for pixels (or, more likely, groups of pixels) representing a particular feature of an object is an explanation of why they’re classified the way they are. But such an “explanation” in inscrutable; we cannot ourselves easily translate it into the language of lines, curves, apparent depth, etc. (unless we write some piece of software to do this and which is then effectively part of the agent).
Look at it from the other end: You can take a picture of a baseball and overlay noise on top of it. There could, at least plausibly, be a point where overlaying the noise destroys the ability for humans to see the baseball, but the information is actually still present (and could, for instance, be recovered if you applied a noise reduction algorithm to that). Perhaps when you are twiddling the pixels of random noise, you’re actually constructing such a noisy baseball image a pixel at a time.
Perhaps when you are twiddling the pixels of random noise, you’re actually constructing such a noisy baseball image a pixel at a time.
You could be constructing a noisy image of a baseball one pixel at a time. In fact if you actually are then your network would be amazingly robust. But in a non-robust network, it seems much more probable that you’re just exploiting the system’s weaknesses and milking them for all they’re worth.
A few brief supplements to your introduction:
The source of the generated image is no longer mysterious: Inceptionism: Going Deeper into Neural Networks
But though the above is quite fascinating and impressive, we should also keep in mind the bizarre false positives that a person can generate: Images that fool computer vision raise security concerns
The trippy shuggorth title image was mysterious when it was originally posted, basically someone leaked an image a little before the inceptionism blog post.
A CNN is a reasonable model for fast feedforward vision. We can isolate this pathway for biological vision by using rapid serial presentation—basically flashing an image for 100ms or so.
So imagine if you just saw a flash of one of these images, for a brief moment, and then you had to quickly press a button for the image category—no time to think about it—it’s jeopardy style instant response.
There is no button for “noisy image”, there is no button for “wavy line image”, etc.
Now the fooling images are generated by an adversarial process. It’s like we have a copy of a particular mind in a VR sim, we flash it an image, see what button it presses. Based on the response, we then generate a new image and unwind time and repeat. We keep doing this until we get some wierd classification errors. It allows us to explore the decision space of the agent.
It is basically reverse engineering. It requires a copy of the agent’s code or at least access to a copy with the ability to do tons of queries, and it also probably depends on the agent being completely deterministic. I think that biological minds avoid this issue indirectly because they use stochastic sampling based on secure hardware/analog noise generators.
Stochastic models/ANNs could probably avoid this issue.
I look at the bizarre false positives and I wonder if (warning: wild speculation) the problem is that the networks were not trained to recognize the lack of objects. For example, in most cases you have some noise in the image, so if every training image is something, or rather something-plus-noise, then the system could learn that the noise is 100% irrelevant and pick out the something.
(The noisy images look to me like they have small patches in one spot faintly resembling what they’re identified as — if my vision had a rule that deemphasized the non-matching noise and I had a much smaller database of the world than I do, then I think I’d agree with those neural networks.)
If the above theory is true, then a possible fix would be to include in training data a variety of images for which the expected answers are like “empty scene”, “too noisy”, “simple geometric pattern”, etc. But maybe this is already done — I’m not familiar with the field.
No, even if you classify these false positives as “no image”, this will not prevent someone from constructing new false positives.
Basically the amount of training data is always extremely small compared to the theoretically possible number of distinct images, so it is always possible to construct such adversarial positives. These are not random images which were accidentally misidentified in this way. They have been very carefully designed based on the current data set.
Something similar is probably theoretically possible with human vision recognition as well. The only difference would be that we would be inclined to say “but it really does look like a baseball!”
This technique exploits the fact that the CNN is completely deterministic—see my reply above. It may be very difficult for stochastic networks.
CNNs are comparable to the first 150ms or so of human vision, before feedback , multiple saccades, and higher order mental programs kicks in. So the difficulty in generating these fooling images also depends on the complexity of the inference—a more complex AGI with human-like vision given larger amounts of time to solve the task would probably also be harder to fool, independent of the stochasticity issue.
A human being would be capable of pointing out why something looks like a baseball—to be able to point out where the curves and lines are that provoke that idea. We do this when we gaze at clouds without coming to believe there really are giant kettles floating around; we’re capable of taking the abundance of contextual information in the scene into account and coming up with reasonable hypotheses for why what we’re seeing looks like x, y or z. If classifier vision systems had the same ability they probably wouldn’t make the egregious mistakes they do.
If I understand correctly how these images are constructed, it would be something like this: take some random image. The program can already make some estimate of whether it is a baseball, say 0.01% or whatever. Then you go through the image pixel by pixel and ask, “If I make this pixel slightly brighter, will your estimate go up? if not, will it go up if I make it slightly dimmer?” (This is just an example, you could change the color or whatever as well.) Thus you modify each pixel such that you increase the program’s estimate that it is a baseball. By the time you have gone through all the pixels, the probability of being a baseball is very high. But to us, the image looks more or less just the way it did at first. Each pixel has been modified too slightly to be noticed by us.
But this means that in principle the program can indeed explain why it looks like a baseball—it is a question of a very slight tendency in each pixel in the entire image.
But the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the “explanation” for why the data seems to contain a baseball image will be completely different. Human beings on the other hand can look at pictures of the same object in different conditions of lighting, of different particular sizes and shapes, taken from different camera angles, etc. and still come up with what would be basically the same set of justifications for matching each image to a particular classification (e.g. an image contains a roughly spherical field of white, with parallel bands of stitch-like markings bisecting it in an arc...hence it’s of a baseball).
The ability of human beings to come up with such compressed explanations, and our ability to arrange them into an ordering, is arguably what allows us to deal with iconic representations of and represent objects at varying levels of detail (as in http://38.media.tumblr.com/tumblr_m7z4k1rAw51rou7e0.png).
Will it?
What if slightly twiddling the RGB values produces something that is basically “spherical field of white, etc. with enough noise on top of it that humans can’t see it”?
That would all hinge on what it means for an image to be “hidden” beneath noise, I suppose. The more noise you layer on top of an image the more room for interpretation there is in classifying it, and the less salient any particular classification candidate will be. If a scrutable system can come up with compelling arguments for a strange classification that human beings would not make, then its choices would be naturally less ridiculous than otherwise. But to say that “humans conceivably may suffer from the same problem” is a bit of a dodge; esp. in light of the fact that these systems are making mistakes we clearly would not.
But either way, what you’re proposing and what Unknowns was arguing are different. Unknowns was (if I understood him rightly) arguing that the assignment of different probability weights for pixels (or, more likely, groups of pixels) representing a particular feature of an object is an explanation of why they’re classified the way they are. But such an “explanation” in inscrutable; we cannot ourselves easily translate it into the language of lines, curves, apparent depth, etc. (unless we write some piece of software to do this and which is then effectively part of the agent).
Look at it from the other end: You can take a picture of a baseball and overlay noise on top of it. There could, at least plausibly, be a point where overlaying the noise destroys the ability for humans to see the baseball, but the information is actually still present (and could, for instance, be recovered if you applied a noise reduction algorithm to that). Perhaps when you are twiddling the pixels of random noise, you’re actually constructing such a noisy baseball image a pixel at a time.
Agree with all you said, but have to comment on
You could be constructing a noisy image of a baseball one pixel at a time. In fact if you actually are then your network would be amazingly robust. But in a non-robust network, it seems much more probable that you’re just exploiting the system’s weaknesses and milking them for all they’re worth.