That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
A more interesting project would be to explore LW’s ideological landscape. It would be very interested in how various rationalist beliefs interact with each other. Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).