How to Grow a Mind (video)
From the recent NIPS conference, here’s a talk by MIT cognitive scientist Josh Tenenbaum on what he calls “rich” machine learning. Should be of some relevance for people who are interested in AI or human cognitive development. I found it really interesting.
http://videolectures.net/nips2010_tenenbaum_hgm/
The gist is: children are able, from a young age, to learn the meanings of words from just a few examples. Adults, given pictures of abstract, made-up objects, and given that some of them are called “tufas,” can pick out which other pictures are tufas and which aren’t. We can do this much faster than a typical Bayesian estimator can, with less training data. This is partly because we have background knowledge about the world and what sort of categories and structures it forms.
For instance, we learn fairly young (~2 years) that non-solid objects are defined by substance rather than shape: that “toothpaste” is that sticky substance we brush our teeth with, whether it’s in the toothpaste tube or on the toothbrush or smeared on the sink, all very different shapes in terms of the raw images hitting our visual cortex. Pour a new liquid called “floo” in a glass, and we’ll predict that it’s still called “floo” when you spill it or while it’s splashing through the air. On the other hand, some objects are about shape more than color or texture: a chair is a chair no matter what it’s made of. Some sets of objects fall into tree-like organization (taxonomy of living things) and some fall into “flat” clusters without hierarchy (the ROYGBIV colors). It takes children three years or so to understand that the same object can be both a dog and a mammal. We learn over time which structural organization is best for which sorts of concepts in the world.
Research in machine learning/computer vision/statistics/related fields often focuses on optimizing the description of data given a certain form, and not assuming that the machine has any “life experience” or “judgment” about what format is best. Clustering algorithms give the best way to sort data into clusters, if we think it falls into clusters; dimensionality reduction techniques give the best way to sort data onto low-dimensional subspaces, if we think it lies in a subspace; manifold learning techniques give the best way to sort data onto low-dimensional manifolds, if we think it lies on a manifold. There’s less attention paid to how we identify the best structural model (and whether the process of identifying the best model can somehow be automated, moved from human judgment to machine judgment.) See Jordan Ellenberg’s piece in Slate for more on this.
We usually don’t use completely different techniques in computer vision for identifying pictures of cows vs. pictures of trucks. But there’s evidence that the human brain does do exactly that—specialized techniques based on the experiential knowledge that cows, trucks, and faces are different sorts of objects, and are identified by different sets of features.
I’m sympathetic to Tenenbaum’s main point: that we won’t achieve computer counterparts to human learning and sensory recognition until we incorporate experiential knowledge and “learning to learn.” There is no single all-purpose statistical algorithm that Explains All Of Life—we’re going to have to teach machines to judge between algorithms based on context. That seems intuitively right to me, but I’d like to hear some back-and-forth on whether other folks agree.
Right, which is why I think the dimensionality reduction stuff, and in some sense all of machine learning, is kind of dishonestly packaged. These guys claim their algorithms are in some sense general, but that can’t really be true. There can never be a proof that a learning algorithm works for all situations. You can only prove statements of the form: “If the data exhibits property X, then algorithm Y will work”.
To live up to their promises, machine learning, computer vision, SNLP, and related fields need to become empirical sciences. They are currently strange hybrids of math and engineering (if you think they are empirical sciences, then what falsifiable predictions do they make? And if you don’t buy the falsifiability principle, then state an alternative answer to the Demarkationsproblem).
I would submit that the real reason for this is because the judge for whether the human subject has appropriately learned the classifier rule is also a human, and therefore shares the same inductive biases! If a non-human, near-blank-slate intelligence were the judge, it would rule that the human learned slower than a Bayesian estimator.
Right—shared prior for “what sorts of concepts go with words”, effectively. (bias or no)
Agreed that there is no secret to learning. Still, I think that the best models for computer learning will be very different than human learning, eliminating many of the biases humans have.
One thing to consider on the topic: Humans use this type of learning not only because it is a good method, but also because humans have limited memory space. Soon computer memory will exceed human memory, and this may not be as much of an issue.