It seem to me like to make major contributions to human knowledge you need to do a lot more than say: “Hey PCA is really great”. You actually have to understand reasons of why people aren’t using it and fixing those reasons.
Have you read my speed dating project posts? I haven’t yet written up the most important one on demographics (I can do that soon, just many conflicting priorities), but the one on individual variation in revealed preferences for attractiveness vs intelligence and sincerity starts to get at what I’m talking about.
My project gives a proof of concept for what I’m talking about in the context of social psychology. I’ve never seen such an application. So no, it’s not just the realization that it could be applied, it’s also giving a proof of concept: that’s why it took ~1500 hours rather than ~10 hours.
As far as I can tell, the situation is simply that deep knowledge of the technique hasn’t yet percolated into the social psychology community, and people who do have the relevant background knowledge haven’t actually tried doing social psychology research. All you need is to notice something that’s been missed. There are many such things (see Peter Thiel’s discussion of how there are still secrets in his book “From Zero To One.”)
If I recall correctly, Freeman Dyson has indicated that his demonstration of the equivalence of the two different formulations of quantum electrodynamics isn’t as amazing as people believe, but was largely a function of him being one of the first people to learn both formulations! :-)
So I’d strongly encourage you to pursue your ideas more. I’ve been looking some at the General Social Survey data, where I haven’t yet found something highly nontrivial (maybe I’m looking at the data the wrong way, or maybe it’s just not a good dataset for this). I’d be happy to share my code with you / a cleaned form of the data, if you’re interested in exploring factors for political labels.
It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
My project gives a proof of concept for what I’m talking about in the context of social psychology. I’ve never seen such an application. So no, it’s not just the realization that it could be applied, it’s also giving a proof of concept: that’s why it took ~1500 hours rather than ~10 hours.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
I’m not sure that you have shown that it makes more sense to interpret that factor individual preference is about intelligence and sincerity
than that it’s about the value of fun.
As far as I can see it could also be that fun&physical attractiveness is simply more valued.
So I’d strongly encourage you to pursue your ideas more. I’ve been looking some at the General Social Survey data, where I haven’t yet found something highly nontrivial (maybe I’m looking at the data the wrong way, or maybe it’s just not a good dataset for this). I’d be happy to share my code with you / a cleaned form of the data, if you’re interested in exploring factors for political labels.
In the case of the spending effort on the GSS I can’t envision what success looks like. It’s straightforward to find PCR factors but I don’t know how to put them to good use.
A more interesting project would be to explore LW’s ideological landscape.
It would be very interested in how various rationalist beliefs interact with each other.
Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
Having a project that searches where the main dimensions of disagreement in this community would be valuable.
Maybe 300 questions that are answered on a Likert scale. Maybe 150 rationality questions, 100 big 5 questions
and 50 autism questions.
It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
A more interesting project would be to explore LW’s ideological landscape. It would be very interested in how various rationalist beliefs interact with each other. Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).
Have you read my speed dating project posts? I haven’t yet written up the most important one on demographics (I can do that soon, just many conflicting priorities), but the one on individual variation in revealed preferences for attractiveness vs intelligence and sincerity starts to get at what I’m talking about.
My project gives a proof of concept for what I’m talking about in the context of social psychology. I’ve never seen such an application. So no, it’s not just the realization that it could be applied, it’s also giving a proof of concept: that’s why it took ~1500 hours rather than ~10 hours.
As far as I can tell, the situation is simply that deep knowledge of the technique hasn’t yet percolated into the social psychology community, and people who do have the relevant background knowledge haven’t actually tried doing social psychology research. All you need is to notice something that’s been missed. There are many such things (see Peter Thiel’s discussion of how there are still secrets in his book “From Zero To One.”)
If I recall correctly, Freeman Dyson has indicated that his demonstration of the equivalence of the two different formulations of quantum electrodynamics isn’t as amazing as people believe, but was largely a function of him being one of the first people to learn both formulations! :-)
So I’d strongly encourage you to pursue your ideas more. I’ve been looking some at the General Social Survey data, where I haven’t yet found something highly nontrivial (maybe I’m looking at the data the wrong way, or maybe it’s just not a good dataset for this). I’d be happy to share my code with you / a cleaned form of the data, if you’re interested in exploring factors for political labels.
It might be that I have gotten to cynic but if you measure 6 variables it’s more likely that one of them get a statistical significant result then if you first turn those 6 variables into 2 variables via PCA.
That probably where there’s something I don’t understand. I don’t understand why the analysis took ~1500 hours. Spending that much time with a dataset also instinctively triggers “fishing expedition” in my head. I don’t know to what extend that’s warranted.
I’m not sure that you have shown that it makes more sense to interpret that factor individual preference is about intelligence and sincerity than that it’s about the value of fun.
As far as I can see it could also be that fun&physical attractiveness is simply more valued.
In the case of the spending effort on the GSS I can’t envision what success looks like. It’s straightforward to find PCR factors but I don’t know how to put them to good use.
A more interesting project would be to explore LW’s ideological landscape. It would be very interested in how various rationalist beliefs interact with each other. Does seeing yourself as an “aspiring rationalist” correlates to beliefs on UFAI risk?
Having a project that searches where the main dimensions of disagreement in this community would be valuable. Maybe 300 questions that are answered on a Likert scale. Maybe 150 rationality questions, 100 big 5 questions and 50 autism questions.
Yes, this is the point :-)
The issue of multiple hypothesis testing is precisely why it took 1500 hours :-). I was dealing with the general question “how can you find the most interesting generalizable patterns in a human interpretable data set?” It’ll take me a long time to externalize what I learned.
For now I’ll just remark that dimensionality reduction reduces concerns around multiple hypothesis testing. If you have a cluster of variables A and a cluster of features B and you suspect that there’s some relationship between the variables A and the variables B, you can do PCA on the two clusters separately, then look at correlations between the first few principal components rather than looking at all pairwise correlations between variables in A and variables in B.
There is the 2014 LW survey data, which is interesting, even if less substantive than what you have in mind. I have an unfinished project that I’m doing with it (got bogged down in cleaning it to make it nicely readable).