Models, myths, dreams, and Cheshire cat grins
“she has often seen a cat without a grin but never a grin without a cat”
Alice in Alice in Wonderland, about the Cheshire cat (also known as the Unitary Authority of Warrington Cat).
Let’s have a very simple model. There’s a boolean, , which measures whether there’s a cat around. There’s a natural number , which counts the number of legs on the cat, and a boolean , which checks whether the cat is grinning (or not).
There are a few obvious rules in the model, to make it compatible with real life:
.
.
Or, in other words, if there’s no cat, then there are zero cat legs and no grin.
And that’s true about reality. But suppose we have trained a neural net to automatically find the values of , , and . Then it’s perfectly conceivable that something might trigger the outputs and simultaneously: a grin without any cat to hang it on.
Adversarial examples
Adversarial examples often seem to behave this way. Take for example this adversarial example of a pig classified as an airliner:
Imagine that the neural net was not only classifying “pig” and “airliner”, but other things like “has wings” and “has fur”.
Then the “pig-airliner” doesn’t have wings, and has fur, which are features of pigs but not airliners. Of course, you could build an adversarial model that also breaks “has wings” and “has fur”, but, hopefully, the more features that need to be faked, the harder it would become.
This suggests that, as algorithms get smarter, they will become more adept at avoiding adversarial examples—as long as the ultimate question is clear. In our real world, the categories of pigs and airliners are pretty sharply distinct.
We run into problems, though, if the concepts are less clear—such as what might happens to pigs and airliners if the algorithm optimises them, or how the algorithm might classify underdefined concepts like “human happiness”.
Myths and dreams
Define the following booleans: detects the presence of a living human head, a living human body, a living jackal head, a living jackal body.
In our world real world we generally have and . But set the following values:
and you have the god Anubis.
Similarly, what is a dragon? Well, it’s an entity such that the following are all true:
And, even though those features never go together in the real world, we can put them together in our imagination, and get a dragon.
Note that “is flying” seems more fundamental to a dragon than “has wings”, thus all the wingless dragons that fly “by magic[1]”. Our imagination seem comfortable with such combinations.
Dreams are always bewildering upon awakening, because they also combine contradictory assumptions. But these combinations are often beyond what our imaginations are comfortable with, so we get things like meeting your mother—who is also a wolf—and handing Dubai to her over the tea cups (that contain milk and fear).
“Alice in Wonderland” seems to be in between the wild incoherence of dream features, and the more restricted inconsistency of stories and imagination.
- ↩︎
Not that any real creature that size could fly with those wings anyway.
Sorta related: my comment here
Thanks! Good insights there. Am reproducing the comment here for people less willing to click through:
Why do you think adversarial examples seem to behave this way? The pig equation seems equally compatible with fur or no fur recognized, wings or no wings. Indeed, it plausibly thinks the pig an airliner because it sees wings and no fur.
Then it has a wrong view of wings and fur (as well as a wrong view of pigs). The more features it has to get right, the harder the adversarial model is to construct—it’s not just moving linearly in a single direction.
Surely, the adversary convinces it this is a pig by convincing it that it has fur and no wings? I don’t have experience in how it works on the inside, but if the adversary can magically intervene on each neuron, changing its output by d by investing d² effort, then the proper strategy is to intervene on many features a little. Then if there are many layers, the penultimate layer containing such high level concepts as fur or wings would be almost as fooled as the output layer, and indeed I would expect the adversary to have more trouble fooling it on such low-level features as edges and dots.
I’m wondering what this thesis of this post is.
Artwork doesn’t have to be about reality?
“How to think about features of models and about consistency”, in a relatively fun way as an intro to a big post I’m working on.