[QUESTION]: Looking for insights from machine learning that helped improve state-of-the-art human thinking
This question is a follow-up of sorts to my earlier question on academic social science and machine learning.
Machine learning algorithms are used for a wide range of prediction tasks, including binary (yes/no) prediction and prediction of continuous variables. For binary prediction, common models include logistic regression, support vector machines, neural networks, and decision trees and forests.
Now, I do know that methods such as linear and logistic regression, and other regression-type techniques, are used extensively in science and social science research. Some of this research looks at the coefficients of such a model and then re-interprets them.
I’m interesting in examples where knowledge of the insides of other machine learning techniques (i.e., knowledge of the parameters for which the models perform well) has helped provide insights that are of direct human value, or perhaps even directly improved unaided human ability. In my earlier post, I linked to an example (courtesy Sebastian Kwiatkowski) where the results of naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as “small bedroom”, were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit).
PS: Here’s a very quick description of how these supervised learning algorithms work. We first postulate a functional form that describes how the output depends on the input. For instance, the functional form in the case of logistic regression outputs the probability as the logistic function applied to a linear combination of the inputs (features). The functional form has a number of unknown parameters. Specific values of the parameters give specific functions that can be used to make predictions. Our goal is to find the parameter values.
We use a huge amount of labeled training data, plus a cost function (which itself typically arises from a statistical model for the nature of the error distribution) to find the parameter values. In the crudest form, this is purely a multivariable calculus optimization problem: choose parameters so that the total error function between the predicted function values and the observed function values is as small as possible. There are a few complications that need to be addressed to get to working algorithms.
So what makes machine learning problems hard? There are a few choice points:
Feature selection: Figuring out the inputs (features) to use in predicting the outputs.
Selection of the functional form model
Selection of the cost function (error function)
Selection of the algorithmic approach used to optimize the cost function, addressing the issue of overfitting through appropriate methods such as regularization and early stopping.
Of these steps, (1) is really the only step that is somewhat customized by domain, but even here, when we have enough data, it’s more common to just throw in lots of features and see which ones actually help with prediction (in a regression model, the features that have predictive power will have nonzero coefficients in front of them, and removing them will increase the overall error of the model). (2) and (3) are mostly standardized, with our choice really being between a small number of differently flavored models (logistic regression, neural networks, etc.). (4) is the part where much of the machine learning research is concentrated: figuring out newer and better algorithms to find (approximate) solutions to the optimization problems for particular mathematical structures of the data.
You may be interested in this white paper by a Google enginer using a NN to predict power consumption for their data centers with 99.6% accuracy.
http://googleblog.blogspot.com/2014/05/better-data-centers-through-machine.html
Looking at the interals of the model he was able to determine how sensitive the power consumption was to various factors. 3 examples were given for how the new model let them optimize power consumption. I’m a total newbie to ML but this is one of the only examples I’ve seen where: predictive model → optimization.
Here’s another example you might like from Kaggle cause-effect pairs challenge. The winning model was able to accurately classify whether A->B, or B->A with and AUC of >.8 , which is better than some medical tests. A writeup and code were provided by the top three kagglers.
http://clopinet.com/isabelle/Projects/NIPS2013/
Thanks, both of these look interesting. I’m reading the Google paper right now.
I think it’s worth including inference on the list of things that make machine learning difficult. The more complicated your model is, the more computationally difficult it will be to do inference in it, meaning that researchers often have to limit themselves to a much simpler model than they’d actually prefer to use, in order to make inference actually tractable.