Be careful here. Statistical intuition does not come naturally to humans—Kahneman and others have written extensively about this. Learning some mathematical facts (relatively simple to do) without learning the correct statistical intuitions (hard to do) may well have negative utility. Unjustified self confidence is an obvious outcome.
If you take the average introductory statistics textbook it tells you thinks that are true for normally distributed data.
If you are faced with a real world problem that doesn’t follow the normal distribution and try to apply statistical techniques proven to work for normal distributed data you are getting mistakes.
Being good at statistical modelling means that you have an idea of what assumptions you can make about a certain data set and the kind of errors you will get when your assumptions don’t match reality.
Example of a mathematical fact: a formula for calculating correlation coefficient.
Example of a statistical intuition: knowing when to conclude that close-to-zero correlation implies independence.
(To see the problem, see this picture for some datasets in which variables are uncorrelated, but not independent.)
Example of a statistical intuition: knowing when to conclude that close-to-zero correlation implies independence.
Not sure why are you calling this “intuition”. Understanding that Pearson correlation attempts to measure a linear relationship and many relationships are not linear is just statistical knowledge, only a bit higher level than knowing the formula.
The impression I get of gwern is that he reads widely, thinks creatively, and experiments frequently, so he is constantly confronted with hypotheses that he has encountered or has generated. His use of statistics is generally confirmatory, in that he’s using data to filter out unjustified hypotheses so he can further research or explore or theorize about the remaining ones.
Another thing you can do with data is exploratory data analysis, using statistics to pull out interesting patterns for further consideration. The workflow for this might look more like:
Acquire (often multivariate) data from another researcher, source, or experiment.
Look at its marginal distributions to check your understanding of the system and catch really obvious outliers.
Maybe use tools like mixture modeling or Box-Cox transformation to clarify marginal distributions.
Use statistical tools like (linear, logistic, support vector, etc.) regression, PCA, etc., to find patterns in the data.
Do stuff with the resulting patterns: think up mechanisms, do confirmatory analysis, check literature, show them to other people, etc.
A lot of what you get out of this process will be spurious, but seeing hypotheses that the data seemed to support go down in flames is a good way to convince yourself of the value of confirmatory analysis, and of tools for dealing with this multiple testing problem.
(Ilya, you know all of this, surely at a deeper level than I do. I’m just rhetorically talking to you as a means to dialogue at Capla. Gwern, hopefully my model of you is not too terrible.)
The impression I get of gwern is that he reads widely, thinks creatively, and experiments frequently
I want to do that. Tell me how. I think I already read widely (at least compared to my meat-space peers and possibly compared to the typical LW reader), but I can do better. I am frequently complimented for asking creative questions, coming up with unusual ideas and solutions (again, in comparison to non-rationalists), but if there are ways to do this better, I want to hear them. However, I want to make regular experimentation a part of my life and don’t really know how. I’m interning with a psych lab, and hope to work with some behavioral economists who run field-experiments.
How do I gain proficiency with experimental methods and build the habit of running simple experiments regularly? I suppose that there’s a certain kind of phenomenon that to the educated mind is automatically flagged as ripe for experimentation (I’m thinking of Feynman’s curiosity about the ants in his room or Harry James Potter-Evans-Verres testing to find out what the optimal way to fight is, prior the the first battle), but I don’t have that intuition, yet.
using statistics to pull out interesting patterns for further consideration
That’s usually called “data mining” and is a popular activity. Unfortunately many people think that’s all they need and stop before the confirmatory phase.
Political Science! Since you’re interested in election dynamics, 538′s description of its model is a good place to get a punches-pulled look at how a statistical model is constructed.
It’s worth pointing to something specific one could immediately start working on, because I think people underrate the trivial inconvenience of not knowing which specific book or course to consult. The course linked is not that basic, admittedly, but even if it’s too advanced it should help highlight specific keywords & terms to look up on Google, Wikipedia, or textbooks.
Basic statistics
Be careful here. Statistical intuition does not come naturally to humans—Kahneman and others have written extensively about this. Learning some mathematical facts (relatively simple to do) without learning the correct statistical intuitions (hard to do) may well have negative utility. Unjustified self confidence is an obvious outcome.
Can you elaborate? What is the difference between “mathematical facts” and “statistical intuitions”? Can you give an example of each?
If you take the average introductory statistics textbook it tells you thinks that are true for normally distributed data.
If you are faced with a real world problem that doesn’t follow the normal distribution and try to apply statistical techniques proven to work for normal distributed data you are getting mistakes.
Being good at statistical modelling means that you have an idea of what assumptions you can make about a certain data set and the kind of errors you will get when your assumptions don’t match reality.
Example of a mathematical fact: a formula for calculating correlation coefficient. Example of a statistical intuition: knowing when to conclude that close-to-zero correlation implies independence. (To see the problem, see this picture for some datasets in which variables are uncorrelated, but not independent.)
Not sure why are you calling this “intuition”. Understanding that Pearson correlation attempts to measure a linear relationship and many relationships are not linear is just statistical knowledge, only a bit higher level than knowing the formula.
Something to consider: what’s a good field in which to learn basic statistics (sticking with the “learn by doing, when possible” theme) ?
If you want to learn statistics by doing, try to do what gwern does.
Or do the complete opposite.
The impression I get of gwern is that he reads widely, thinks creatively, and experiments frequently, so he is constantly confronted with hypotheses that he has encountered or has generated. His use of statistics is generally confirmatory, in that he’s using data to filter out unjustified hypotheses so he can further research or explore or theorize about the remaining ones.
Another thing you can do with data is exploratory data analysis, using statistics to pull out interesting patterns for further consideration. The workflow for this might look more like:
Acquire (often multivariate) data from another researcher, source, or experiment.
Look at its marginal distributions to check your understanding of the system and catch really obvious outliers.
Maybe use tools like mixture modeling or Box-Cox transformation to clarify marginal distributions.
Use statistical tools like (linear, logistic, support vector, etc.) regression, PCA, etc., to find patterns in the data.
Do stuff with the resulting patterns: think up mechanisms, do confirmatory analysis, check literature, show them to other people, etc.
A lot of what you get out of this process will be spurious, but seeing hypotheses that the data seemed to support go down in flames is a good way to convince yourself of the value of confirmatory analysis, and of tools for dealing with this multiple testing problem.
I remember Gelman saying useful stuff like this, but it’s been a while since I read that post so I might be mischaracterizing it.
(Ilya, you know all of this, surely at a deeper level than I do. I’m just rhetorically talking to you as a means to dialogue at Capla. Gwern, hopefully my model of you is not too terrible.)
I want to do that. Tell me how. I think I already read widely (at least compared to my meat-space peers and possibly compared to the typical LW reader), but I can do better. I am frequently complimented for asking creative questions, coming up with unusual ideas and solutions (again, in comparison to non-rationalists), but if there are ways to do this better, I want to hear them. However, I want to make regular experimentation a part of my life and don’t really know how. I’m interning with a psych lab, and hope to work with some behavioral economists who run field-experiments.
How do I gain proficiency with experimental methods and build the habit of running simple experiments regularly? I suppose that there’s a certain kind of phenomenon that to the educated mind is automatically flagged as ripe for experimentation (I’m thinking of Feynman’s curiosity about the ants in his room or Harry James Potter-Evans-Verres testing to find out what the optimal way to fight is, prior the the first battle), but I don’t have that intuition, yet.
Suggestions?
That’s usually called “data mining” and is a popular activity. Unfortunately many people think that’s all they need and stop before the confirmatory phase.
What does gwern do?
This.
Political Science! Since you’re interested in election dynamics, 538′s description of its model is a good place to get a punches-pulled look at how a statistical model is constructed.
I’ll side-step the “field” part of the question and instead point to an undergraduate lecture course on data analysis which has some online notes and a series of exercises.
It’s worth pointing to something specific one could immediately start working on, because I think people underrate the trivial inconvenience of not knowing which specific book or course to consult. The course linked is not that basic, admittedly, but even if it’s too advanced it should help highlight specific keywords & terms to look up on Google, Wikipedia, or textbooks.