LDL 7: I wish I had a map
When I was an undergraduate considering going to graduate school in mathematics, one thing that I knew a reasonably amount about and spent a good amount of time thinking about was what the field of math looked like overall.
In very, very broad strokes I knew that geometry, algebra, and analysis were generally different things pursued by different people. I knew that these three were very mature sets of tools at least as much as active research areas, and that other areas of math which might be more naively “interesting,” such as chaos theory or number theory or topology, were generally divided up in the research community based on how amenable certain problems were to using tools from each of those three core branches and how much the practitioners like using each of those types of tools. I also knew that these tools weren’t mutually exclusive, and that a lot of the interesting progress in my chosen subfield of number theory was driven by rapid movement and conversion between these tools—viewing an equation as related to a curve geometrically, placing the geometry of that curve in an algebraic structure, and then using analysis to find some solution in this space and trace it back to a solution of the original equation, for example.
While this is a long way from being sufficient for doing mathematics, I do think this whole process and conception of what mathematics is is very important. In particular, if I had a question and I didn’t know if the answer was known or unknown, or if there were techniques that might help, I had a good sense of who to ask and what books to look in to find answers. Sort of a technical academic equivalent of Google Fu—understanding the field well enough to search through it efficiently.
I do not have this sense for deep learning research.
I know there are three fundamental problems in machine learning; supervised, unsupervised, and reinforcement learning. I know that supervised learning problems have many, many avenues of attack and that reinforcement learning has made progress in large part by trying to reduce itself (via something like deep approximate Q-learning) to a supervised learning task, and that unsupervised learning is difficult because it’s unclear how to do this reduction.
But if I have a question like, “how can I use transfer learning to apply knowledge from a large dataset to a small dataset?” I’m going to have a lot of trouble answering it unless I can guess the magical password that this is the “few shot learning” problem which is widely studied under that name. Even if I do get that far, I often don’t know how to identify the latest updates in the field. I managed mostly on my own to find a recent paper from DeepMind (though is 2016 really recent in this paradigm? How would I know?) detailing “matching networks” which use bottlenecks from well trained networks and then train LSTMs on one-shot learning tasks utilizing the bottlenecks as features, which is actually very helpful for the application I have in mind. But when my task moves from a one-shot learning task to simply a small data task (accumulating maybe 10 thousand examples before the model becomes obsolete) will this complete change of paradigm actually help me? And what if I also have a problem with strong imbalances in my data? Will I be able to layer on a solution I find by googling that problem, or will there be major interference? And it is going to be a pain to combine those codebases since I’m now going to have to probably sort through code from two different grad students replicating two different experiments.
Of course, these sorts of difficulties are a reasonable thing to spend time working on, and reading other people’s code is an important part of being a software engineer (never mind that I’m not a software engineer). And the questions I have are the sorts of stupid questions that I would (now that I’ve spent a lot of time as a grad student) feel very comfortable asking if I were a grad student and there were professors around who had some reason to spend time talking to me.
But I am not a grad student, and I don’t have a lot of great opportunities for mentorship around me right now (though I’m hoping that may change soon). I’m not sure how to develop this intuition about how to go about finding the correct tools and how to intuit whether and how to start hitting things with the standard successful tools, or how to go about efficiently modifying other ideas and code to get the results I want.
And beyond that, even if I did have this intuition, it wouldn’t solve the problem for everyone else. Maybe there are resources on this subject that would be helpful, but I’m sure there aren’t online courses about it and I don’t trust textbooks too deeply when it comes to cutting edge CS research. And I’m not even sure if a book COULD convey what I’m talking about! Certainly it would be much harder for me to write in a way that conveys my intuition about academic math that it is for me to write in a way that explains specific math.
This does leave me with a question though, for readers who engage with deep learning: how did you develop your intuition for research in the field? What would you recommend for newer researchers?
You got to the end of the essay and went “down” into the details instead of “up” in to the larger problem. Going up would be productive I think, because this is an issue that sort of comes up with every single field of human knowledge that exists, especially the long tail of specializations and options for graduate studies.
When you were an undergraduate and spent a lot of time thinking about the structure of mathematical knowledge, you were building a sort of map of all the maps that exist inside the books and minds of of the community of mathematians, with cues based on presumed structure in math itself, that everyone was studying in common.
Your “metamap of math” that you had in your head is not something shared by everyone, and I do not think that it would be easy for you to share (though maybe I’m wrong about your assessment of its sharability).
When I saw title of your post, I thought to myself “Yes! I want a map of all of human knowledge too!” and I was hoping that I’d get pointers towards a generalization of your undergrad work in mathematics, except written down, and shareable, and about “all of human knowledge”. But then I got to the end of your essay and, as I said, it went “down” instead of “up”… :-/
Anyway, for deep learning, I think a lot of the metamap comes from just running code to get a practical feel for it, because unlike math the field is more empirically based (where people try lots of stuff, hide their mistakes, and then polish up the best thing they found to present as if they understand exactly how and why it worked).
For myself, I watch for papers to float by, and look for code to download and try out, and personally I get a lot from googelstalking smart people via blogs.
The best thing I know in this vein as a starting point is a post by Ilya Sutskever (guest writing on Yisong Yue’s blog), with an overview of practical issues in deep learning to keep in mind when trying to make models train and do well that seem not to click when you think you have enough data and enough GPU and a decent architecture, yet they are still not working.
Maybe you could use this deep learning algorithm.