It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
My project gives a proof of concept for what I’m talking about in the context of social psychology. I’ve never seen such an application. So no, it’s not just the realization that it could be applied, it’s also giving a proof of concept: that’s why it took ~1500 hours rather than ~10 hours.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
I’m not sure that you have shown that it makes more sense to interpret that factor individual preference is about intelligence and sincerity
than that it’s about the value of fun.
As far as I can see it could also be that fun&physical attractiveness is simply more valued.
So I’d strongly encourage you to pursue your ideas more. I’ve been looking some at the General Social Survey data, where I haven’t yet found something highly nontrivial (maybe I’m looking at the data the wrong way, or maybe it’s just not a good dataset for this). I’d be happy to share my code with you / a cleaned form of the data, if you’re interested in exploring factors for political labels.
In the case of the spending effort on the GSS I can’t envision what success looks like. It’s straightforward to find PCR factors but I don’t know how to put them to good use.
A more interesting project would be to explore LW’s ideological landscape.
It would be very interested in how various rationalist beliefs interact with each other.
Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
Having a project that searches where the main dimensions of disagreement in this community would be valuable.
Maybe 300 questions that are answered on a Likert scale. Maybe 150 rationality questions, 100 big 5 questions
and 50 autism questions.
It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
A more interesting project would be to explore LW’s ideological landscape. It would be very interested in how various rationalist beliefs interact with each other. Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).
It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
I’m not sure that you have shown that it makes more sense to interpret that factor individual preference is about intelligence and sincerity than that it’s about the value of fun.
As far as I can see it could also be that fun&physical attractiveness is simply more valued.
In the case of the spending effort on the GSS I can’t envision what success looks like. It’s straightforward to find PCR factors but I don’t know how to put them to good use.
A more interesting project would be to explore LW’s ideological landscape. It would be very interested in how various rationalist beliefs interact with each other. Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
Having a project that searches where the main dimensions of disagreement in this community would be valuable. Maybe 300 questions that are answered on a Likert scale. Maybe 150 rationality questions, 100 big 5 questions and 50 autism questions.
Yes, this is the point :-)
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).