There are also language problems here—most psychological “experiments” consist of giving people questionnaires, followed by data mining of them. And questionnaires are very language and culture dependent.
I know it’s just anecdotal evidence, but I know someone who tried to translate some standard English questionnaire (mindfulness or somesuch) into Polish for their MSc thesis, and testing both on students who majored in English, so were supposedly fluent in both languages. All usual controls like randomizing order of questionnaires, and using multiple independent translations were used. And in spite of all that correlations between answers to the same exact question in Polish and English were less than impressive (much lower than the usual test-retest correlation), and for many questions every single translation yielded what was not statistically significantly different from zero correlation.
I think these problems would make a far better thesis than what got actually written, but as a rule failures don’t get written or published.
Even worse than language difficulties, I would think, would be large differences in cultural framing of questions. Every culture brings a different set of background issues to the types of questions asked in many psychological studies. The problem has mostly been solved for IQ type tests, but, even without considering the amount of work involved in developing the cross-cultural IQ tests, framing would be a bigger problem for personality and other “softer” tests. (I have, but have only leafed through, Jensen’s “Bias in Mental Testing”; I can already tell it’s going to take a lot of work, and it’s a bit dated, so I’ve been putting it off since it’s only a peripheral interest.)
“Arthur Jensen Replies to Stephen Jay Gould : THE DEBUNKING OF SCIENTIFIC FOSSILS AND STRAW PERSONS ” , http://www.debunker.com/texts/jensen.html , is a good place to start. It’s a detailed criticism of Gould’s “The Mismeasure of Man” by one of the best psychometricians around. It’s got a good bibliography, but is rather dated being from 1982. No matter what you may think of his politics, Steve Sailer also has a lot of good, and more recent, information in his essays on IQ, especially on international comparisons, on his website, www.isteve.com . Richard Lynn’s books are supposed to be very good also, but I haven’t read them (too many interests, too little time and money).
The very title “debunking of scientific fossils and straw persons” makes it sound like it has limited use. Johnicholas asked for positive statements, but a debunking is purely negative. Just because Gould lied about X doesn’t make his position wrong.
I suspected from your first comment that all you meant was that people who attempt to prove cultural bias in IQ tests have failed. That is certainly true, with some surprising findings, like that the American black-white gap is larger on questions that are, on the face of them, more culturally neutral. But relying on an opposition you don’t trust to do the research is a highly biased search strategy. It is not a great political victory to say that Raven’s matrices are culturally biased, so few say it, but that doesn’t make it false.
Right now, my best source for “answers to Arthur Jensen” is Cosma Shalizi. My understanding is that performance on IQ tests is mostly related to culture—even though that was (to some extent) Gould’s position.
performance on IQ tests is mostly related to culture
Shalizi simply doesn’t say that.
There are two things you could mean by it. One is that some cultures make you smart. The other is that the IQ test mostly screens for culture and not useful abilities. It is certainly true that culture affects the difference between performance on Raven’s matrices and other tests. In particular, the Flynn effect is stronger for Raven’s matrices than other tests. Also, sub-saharan Africans do dramatically worse on RM than on other estimates, where they’re closer to African-Americans (who do slightly worse on RM than on common tests). In applying this information to the two possibilities about culture, you’d have to decide which testing approach you liked better, which would depend on what you’re trying to measure. “g” is not the correct answer to this question.
Yes, but then you have to send the researchers to India. (Unless you also recruit Indian psychologists who already live there to do your replications.)
Gap year students! They can dig some irrigation ditches while they’re there.
Critically, they don’t need to devise their own experiments; they’re effectively doing the leg-work for more senior researchers back in the UK/US, and also making use of the language & cultural skills they’ve learnt for their gap year/volunteering. Also, the data they gather can be used both to judge the hypothesis the test was originally investigating, and reveal differences between cultures and nations.
Ooh, this could be a scholarship thing. “Study Abroad And Do Replication Studies Fund”. Give ’em a grand apiece, no essay required, I bet it would work.
I’d be worried about trusting the students. It’s like giving them a test and your answer key, and telling them ‘hey, we did our best in getting the right answers, but please work through all the problems again and see whether we made any mistakes’. This sort of thing only works if you don’t get too much garbage in your replications.
The students might be honest enough to actually do all the work professionally, but I’m not sure I’d trust American students (a summer/semester isn’t that long, and if they’re in India, there are things to do there that could fill a lifetime; the temptation to just fudge up some data and go do all those awesome things would be tremendous), much less Indian ones.
This sort of thing only works if you don’t get too much garbage in your replications.
You have way too much trust in the professors. Just a few students naive enough to do what they’re supposed to would be an improvement on the status quo.
“We targeted 13 gene-disease associations, each already assessed by meta-analyses, including at least 15 non-Chinese studies. We searched the Chinese Journal Full-Text Database for additional Chinese studies on the same topics. We identified 161 Chinese studies on 12 of these gene-disease associations; only 20 were PubMed-indexed (seven English full-text). Many studies (14–35 per topic) were available for six topics, covering diseases common in China. With one exception, the first Chinese study appeared with a time lag (2–21 y) after the first non-Chinese study on the topic. Chinese studies showed significantly more prominent genetic effects than non-Chinese studies, and 48% were statistically significant per se, despite their smaller sample size (median sample size 146 versus 268, p < 0.001). The largest genetic effects were often seen in PubMed-indexed Chinese studies (65% statistically significant per se). Non-Chinese studies of Asian-descent populations (27% significant per se) also tended to show somewhat more prominent genetic effects than studies of non-Asian descent (17% significant per se).”
The huge amount of data that could be gathered should allow for checking; data that is both different from what westerners would expect, and consistent over several independent students, is likely to be accurate. Or at least, not inaccurate because of lazy students.
Suggesting there’s a market for repeating experiments (cheaply, as well) in rural India? This looks like it’d yield some easy research opportunities.
There are also language problems here—most psychological “experiments” consist of giving people questionnaires, followed by data mining of them. And questionnaires are very language and culture dependent.
I know it’s just anecdotal evidence, but I know someone who tried to translate some standard English questionnaire (mindfulness or somesuch) into Polish for their MSc thesis, and testing both on students who majored in English, so were supposedly fluent in both languages. All usual controls like randomizing order of questionnaires, and using multiple independent translations were used. And in spite of all that correlations between answers to the same exact question in Polish and English were less than impressive (much lower than the usual test-retest correlation), and for many questions every single translation yielded what was not statistically significantly different from zero correlation.
I think these problems would make a far better thesis than what got actually written, but as a rule failures don’t get written or published.
Even worse than language difficulties, I would think, would be large differences in cultural framing of questions. Every culture brings a different set of background issues to the types of questions asked in many psychological studies. The problem has mostly been solved for IQ type tests, but, even without considering the amount of work involved in developing the cross-cultural IQ tests, framing would be a bigger problem for personality and other “softer” tests. (I have, but have only leafed through, Jensen’s “Bias in Mental Testing”; I can already tell it’s going to take a lot of work, and it’s a bit dated, so I’ve been putting it off since it’s only a peripheral interest.)
What evidence do you have that “the problem has mostly been solved for IQ type tests”?
Sorry, that sounded challenging, and it isn’t meant to be. Would you please point me to any books, papers, and so on?
“Arthur Jensen Replies to Stephen Jay Gould : THE DEBUNKING OF SCIENTIFIC FOSSILS AND STRAW PERSONS ” , http://www.debunker.com/texts/jensen.html , is a good place to start. It’s a detailed criticism of Gould’s “The Mismeasure of Man” by one of the best psychometricians around. It’s got a good bibliography, but is rather dated being from 1982. No matter what you may think of his politics, Steve Sailer also has a lot of good, and more recent, information in his essays on IQ, especially on international comparisons, on his website, www.isteve.com . Richard Lynn’s books are supposed to be very good also, but I haven’t read them (too many interests, too little time and money).
The very title “debunking of scientific fossils and straw persons” makes it sound like it has limited use. Johnicholas asked for positive statements, but a debunking is purely negative. Just because Gould lied about X doesn’t make his position wrong.
I suspected from your first comment that all you meant was that people who attempt to prove cultural bias in IQ tests have failed. That is certainly true, with some surprising findings, like that the American black-white gap is larger on questions that are, on the face of them, more culturally neutral. But relying on an opposition you don’t trust to do the research is a highly biased search strategy. It is not a great political victory to say that Raven’s matrices are culturally biased, so few say it, but that doesn’t make it false.
Right now, my best source for “answers to Arthur Jensen” is Cosma Shalizi. My understanding is that performance on IQ tests is mostly related to culture—even though that was (to some extent) Gould’s position.
http://cscs.umich.edu/~crshalizi/weblog/494.html
http://cscs.umich.edu/~crshalizi/weblog/495.html
http://cscs.umich.edu/~crshalizi/weblog/520.html
Shalizi simply doesn’t say that.
There are two things you could mean by it. One is that some cultures make you smart. The other is that the IQ test mostly screens for culture and not useful abilities. It is certainly true that culture affects the difference between performance on Raven’s matrices and other tests. In particular, the Flynn effect is stronger for Raven’s matrices than other tests. Also, sub-saharan Africans do dramatically worse on RM than on other estimates, where they’re closer to African-Americans (who do slightly worse on RM than on common tests). In applying this information to the two possibilities about culture, you’d have to decide which testing approach you liked better, which would depend on what you’re trying to measure. “g” is not the correct answer to this question.
Yes, but then you have to send the researchers to India. (Unless you also recruit Indian psychologists who already live there to do your replications.)
Gap year students! They can dig some irrigation ditches while they’re there.
Critically, they don’t need to devise their own experiments; they’re effectively doing the leg-work for more senior researchers back in the UK/US, and also making use of the language & cultural skills they’ve learnt for their gap year/volunteering. Also, the data they gather can be used both to judge the hypothesis the test was originally investigating, and reveal differences between cultures and nations.
Ooh, this could be a scholarship thing. “Study Abroad And Do Replication Studies Fund”. Give ’em a grand apiece, no essay required, I bet it would work.
I’d be worried about trusting the students. It’s like giving them a test and your answer key, and telling them ‘hey, we did our best in getting the right answers, but please work through all the problems again and see whether we made any mistakes’. This sort of thing only works if you don’t get too much garbage in your replications.
The students might be honest enough to actually do all the work professionally, but I’m not sure I’d trust American students (a summer/semester isn’t that long, and if they’re in India, there are things to do there that could fill a lifetime; the temptation to just fudge up some data and go do all those awesome things would be tremendous), much less Indian ones.
You have way too much trust in the professors. Just a few students naive enough to do what they’re supposed to would be an improvement on the status quo.
The problem is, we already have replications being done by Indian and Chinese scientists and… they’re not very good. Here’s one: “Local Literature Bias in Genetic Epidemiology: An Empirical Evaluation of the Chinese Literature”, 2005:
The huge amount of data that could be gathered should allow for checking; data that is both different from what westerners would expect, and consistent over several independent students, is likely to be accurate. Or at least, not inaccurate because of lazy students.