Thanks for writing this! I really think people should be doing this (applying well-known algorithms to interesting datasets and seeing what happens) a lot more often overall, and it’s on my list of skills I’d really like to learn personally. So I’d be interested to hear a little more info on methodology—what programming language(s) you used, how you generated the graphs, etc.
I’m pretty skeptical of making any connections to the Bay Area rationalist community based on Berkeley’s conscientiousness score (which I think is interesting but not for this reason). There are 100,000 people living in Berkeley, and most of them aren’t rationalists. And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests. (Edit: But see Douglas_Knight’s comment below.) In general I’d think more about selection effects like this before drawing any conclusions.
And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests.
Douglas_Knight is correct – the average age of users is quite low, at ~26 years old both for the high conscientiousness cities and the low conscientiousness cities.
I think you have the causality flipped around. Jonah is suggesting that something about Berkeley contributes to the prevalence of low conscientiousness among rationalists.
What I had in mind was that the apparent low average conscientiousness in the Bay Area might have been one of the cultural factors that drew rationalists who are involved in the in-person community to the location. But of course the interpretation that you raise is also a possibility.
Actually, two of your complaints cancel out. You should expect that the population living in Berkeley has a very young personality, but if all the data is from college students, then there’s nothing special about Berkeley (except that it is large and thus small effects are statistically significant — but the claim is that it has a large effect).
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you’d do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Added: poking around the website I don’t see a clear answer to how old the data is. Most of it seems to have been collected by 2011, but I’m not sure because there are lots of variations. Each big5 score is labeled with the date taken.
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you’d do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Thanks for writing this! I really think people should be doing this (applying well-known algorithms to interesting datasets and seeing what happens) a lot more often overall, and it’s on my list of skills I’d really like to learn personally. So I’d be interested to hear a little more info on methodology—what programming language(s) you used, how you generated the graphs, etc.
I’m pretty skeptical of making any connections to the Bay Area rationalist community based on Berkeley’s conscientiousness score (which I think is interesting but not for this reason). There are 100,000 people living in Berkeley, and most of them aren’t rationalists. And depending on how far back most of this data was collected, plausibly most of the Berkeley respondents were high school or college students (UC Berkeley alone has over 35,000 students), since for awhile that was the main demographic of Facebook users, and probably for awhile longer that was the main demographic of Facebook users willing to take personality tests. (Edit: But see Douglas_Knight’s comment below.) In general I’d think more about selection effects like this before drawing any conclusions.
Glad you liked it :-).
I used R for this analysis. Some resources that you might find relevant:
Practical Data Science with R has very nice introduction to exploratory data analysis.
Advanced R goes into more detail on the language.
The graphs were made using ggplot2.
I used the lme4 package for Bayesian hierarchical modeling. See, e.g. Getting Started with Mixed Effect Models in R.
Kaggle Kernels has some good sample scripts.
Douglas_Knight is correct – the average age of users is quite low, at ~26 years old both for the high conscientiousness cities and the low conscientiousness cities.
Thanks for the links!
I think you have the causality flipped around. Jonah is suggesting that something about Berkeley contributes to the prevalence of low conscientiousness among rationalists.
What I had in mind was that the apparent low average conscientiousness in the Bay Area might have been one of the cultural factors that drew rationalists who are involved in the in-person community to the location. But of course the interpretation that you raise is also a possibility.
Ah, I spoke imprecisely. I meant what you said, as opposed to things of the form “there’s something in the water”.
Previously on LW: Self control may be contagious
Actually, two of your complaints cancel out. You should expect that the population living in Berkeley has a very young personality, but if all the data is from college students, then there’s nothing special about Berkeley (except that it is large and thus small effects are statistically significant — but the claim is that it has a large effect).
I think you are correct that the data is all college students (or at least fairly young people). I believe this because the cities being discussed are the hometown, not the current residence, which is the kind of thing you’d do with college students. In any event, studying hometown controls for the age demographics of Berkeley. But Jonah should have explicitly controlled for age.
Added: poking around the website I don’t see a clear answer to how old the data is. Most of it seems to have been collected by 2011, but I’m not sure because there are lots of variations. Each big5 score is labeled with the date taken.
Good point, I missed this.