Punoxysm comments on [QUESTION]: Academic social science and machine learning

Punoxysm 23 Jul 2014 5:48 UTC
3 points
Every point you made (0)-(5) is correct!

(0) There are some social scientists, especially in political science, who are focused on applying machine learning and text mining methods to political texts. This is a big movement and it’s under the heading “text as data”. Most publications use fairly simple methods, basically calibrated regressions, but a lot of thought went into choosing those and some of the people publishing are mathematically sophisticated.

Example: http://www.justingrimmer.org/

Another prominent example comes in Social Networks, where people from the CS and physics world work on the social side, and some social scientists use the methodology too.

Example: http://cs.stanford.edu/people/jure/

At the Santa Fe institute people from all kinds of disciplines do all kind of stuff, but an overall theme is methods drawing from math and physics applied to social sciences. This include networks, statistical physics, and game theory.

Not exactly social science, but Jennifer Dunne applies network analysis to food webs: http://www.santafe.edu/about/people/profile/Jennifer%20A.%20Dunne

I am certain that cutting edge mathematics and ML are applied in pockets of econometrics too. Finance is often in economics departments and ML has thoroughly invaded that, but I admit that’s a stretch.

(1) Social science academics have only recently gained access to large datasets. Especially in survey-based fields like sociology and experimental psychology, small-data-oriented methods are definitely the focus. Large datasets include medical datasets, to the extent that they have access; various massive text repositories including academic paper databases and online datasets; and a very few surveys that have the size and depth to support fancier analyses.

This applies less to probit and more to clustering, bayes nets, decision trees, etc.

(2) The culture is definitely conservative. I’ve talked to many people interested in the more advanced methods and they have to fight harder to get published; but the tide is changing.

(3) Absolutely. It’s very hard to figure out what coefficients represent when data is ambiguous and many factors are highly correlated (as they are in social science) and when the model is very possibly misspecified. Clusterings with “high score” from most methods can be completely spurious and it take advanced statistical knowledge to identify this. ML is good for prediction and classification, but this is very rarely the goal of social scientists (though one can imagine how it could be). SVMs and decision trees do a poor job of extracting causal relationships with any certainty.

(4) Again, the culture is conservative and many don’t have these training. A good number know their way around R though, and newer ones often come in with quite a bit of stats/CS knowledge. The amount of statistical knowledge in the social sciences is growing fast.

(5) Yes; this is especially true of something like neural networks.