How do I account for how many models I’ve tested? No, really, I don’t know what that’d even be called in the statistics literature, and it seems like if a general technique for doing this were known the big data people would be all over it.
What we’re doing at the FHI is acting like a machine learning problem: splitting the data into a training and a testing set, checking as much as we want on the training set, formulating the hypotheses, then testing them on the testing set.
I see a lot of stepwise regression being used by non-statisticians, but I think statisticians themselves think its something of a joke. If you have more predictors than you can fit coefficients for, and want an understandable linear model you are better off with something like LASSO.
So it wasn’t as clear with the previous link, but it seems to me that the nth step of this method doesn’t condition on the fact that the last n-1 steps failed.
How do I account for how many models I’ve tested? No, really, I don’t know what that’d even be called in the statistics literature, and it seems like if a general technique for doing this were known the big data people would be all over it.
What we’re doing at the FHI is acting like a machine learning problem: splitting the data into a training and a testing set, checking as much as we want on the training set, formulating the hypotheses, then testing them on the testing set.
The Bayesian approach with multiple models seems to be exactly what we need. eg http://www.stat.washington.edu/raftery/Research/PDF/socmeth1995.pdf
Another approach seems to be stepwise regression: http://en.wikipedia.org/wiki/Stepwise_regression
I see a lot of stepwise regression being used by non-statisticians, but I think statisticians themselves think its something of a joke. If you have more predictors than you can fit coefficients for, and want an understandable linear model you are better off with something like LASSO.
Edit: Don’t just take my word for it, google found this blog post for me: http://andrewgelman.com/2014/06/02/hate-stepwise-regression/
I concur. Stepwise regression is a very crude technique.
I find it useful as an initial filter if I have to dig through a LOT of potential predictors, but you can’t rely on it to produce a decent model.
So it wasn’t as clear with the previous link, but it seems to me that the nth step of this method doesn’t condition on the fact that the last n-1 steps failed.