Not really a reply to you. I just found this and needed to put it somewhere. Anyone who has been following this discussion will be interested. It’s an interesting way of posing the question.
Now plot the genome of each human as a point on our lattice. Not surprisingly, there are readily identifiable clusters of points, corresponding to traditional continental ethnic groups: Europeans, Africans, Asians, Native Americans, etc. (See, for example, Risch et al., Am. J. Hum. Genet. 76:268–275, 2005.) Of course, we can get into endless arguments about how we define European or Asian, and of course there is substructure within the clusters, but it is rather obvious that there are identifiable groupings, and as the Risch study shows, they correspond very well to self-identified notions of race.
...
We see that there can be dramatic group differences in phenotypes even if there is complete allele overlap between two groups—as long as the frequency or probability distributions are distinct. But it is these distributions that are measured by the metric we defined earlier. Two groups that form distinct clusters are likely to exhibit different frequency distributions over various genes, leading to group differences.
...
This leads us to two very distinct possibilities in human genetic variation:
Hypothesis 1: (the PC mantra) The only group differences that exist between the clusters (races) are innocuous and superficial, for example related to skin color, hair color, body type, etc.
Hypothesis 2: (the dangerous one) Group differences exist which might affect important (let us say, deep rather than superficial) and measurable characteristics, such as cognitive abilities, personality, athletic prowess, etc.
Hsu’s blog post makes two claims about race. The first argument is that ‘Hypothesis 2’ could be correct—i.e., that there could be genetically driven differences in exciting traits like IQ between races (or ‘groups,’ but I think we all know which ‘groups’ we’re really interested in). I agree with this argument.
I completely disagree with the second claim, which is that genetic clustering studies constitute ‘the scientific basis for race.’ It’s true that scientists can extract clusters from genetic data that match what we call races. If you gave me a bunch of human genotypes sampled from around the world and let me fuck around with that data and run it through PCA for a few hours, I’m sure I could do the same. But it doesn’t automatically follow that my classification is correct.
For example, if you sample some whites, sample some blacks, and expect those two categories to automatically pop out of your analysis, you might be surprised. Here’s a recent paper that estimated the European ancestry in African-Americans by analyzing genotypes from samples of US whites, US blacks, and several subgroups of Africans. Running PCA on all of the genotype data, and plotting the first two principal components of the subjects’ genotypes in each sample gave these clusters:
If we treat the widely separated clusters as races, we don’t automatically recover a black race and a white race. We end up with a Mandenka race, a white race, and a Bantu + Yoruba race, with African-Americans smeared out between them.
The researchers could no doubt have come up with an alternative rotation of the axes that would’ve projected all of the African samples on top of each other, and the European sample far away from them. But what would justify the alternative projection over the original one?
Maybe my own personal concept of ‘race’ emphasizes differences among sub-Saharan Africans, instead of continental differences. Then I might do a PCA on a set of sub-Saharan African genotypes, find a couple of principal components that best separate out the sub-Saharan African subgroups, and only then plot the north Africans and non-Africans along with the sub-Saharans.
Here are a few plots from a study that did just that. Notice now that the most widely separated clusters are three, or perhaps four, sub-Saharan African clusters—and the rest of the world forms one little cluster in the middle of them!
If I were a scientist who had started with the idea that the main races consisted of several African subgroups, plus one other race containing all non-Africans, this analysis would seem to completely vindicate my initial beliefs! But the analysis turned out the way it did mainly because the way I did it was driven by my original taxonomy of ‘races.’
I’ve picked out two papers myself to make points, now I’ll write a bit about the ‘Risch et al.’ paper Hsu points to. Risch et al. calculated genetic clusters by running data collected for the Family Blood Pressure Program through the structure program. Hsu writes that the clusters that emerged ‘correspond very well to self-identified notions of race.’
Well, there’s no ready-made algorithm which takes genotypes as input and spits out objectively determined races, and structure is no exception. There are some subtleties to how the program works. For one thing, it doesn’t automatically confirm an optimal number of clusters and then sort the subjects into the appropriate number of clusters: the researcher tells structure to put subjects into some number k of clusters, and the program then does its best to fit the subjects into k clusters. So the fact that structure’s output contained an intuitively pleasing number of clusters doesn’t mean very much.
Another issue is that the kind of model structure uses to represent distributions of genotypes is suboptimal for cases where samples have been isolated due to distance and have suffered a lack of gene flow. But, if Hsu is correct, this is exactly the case for Risch et al.‘s data, since he writes that Risch et al.‘s ‘clustering is a natural consequence of geographical isolation, inheritance and natural selection operating over the last 50k years since humans left Africa!’
There is more I could write, but I might as well just link this book chapter, which discusses issues with trying to algorithmically infer someone’s racial ancestry. I’ve already written more than I meant to—sorry for the lecture—but it disappoints me when someone well-credentialed (a professor of physics!) uncritically waves around ambiguous results to shore up a folk model of race.
Yes, there are clines, but so what? The population fraction in the clinal region between the major groups is tiny.
The distance (e.g. measured by fst) between the continental groups is so large that you would have to stand on your head to not “discover” those as separate clusters.
Yes, there are clines, but so what? The population fraction in the clinal region between the major groups is tiny.
I’m not sure that this contradicts what I wrote. I acknowledge that high-resolution genotyping enables one to distinguish geographically distant samples of people. Being able to pull that off does not automatically validate ‘race,’ as in the conventional white people v. yellow people v. brown people v. red people taxonomy.
The distance (e.g. measured by fst) between the continental groups is so large that you would have to stand on your head to not “discover” those as separate clusters.
Or you need only come at the data with an unusual preconception of race, which would affect your analytic approach.
Also, if you take wide-ranging genetic samples across Africa (as opposed to using a handful of samples from one Nigerian city to represent all of Africa, as seems to have been done to derive your picture), it seems to me that you end up getting African clusters that can be as far apart from each other as they are from Europeans.
Another example: check out subdiagram A in this diagram, from a paper that took samples from West and South Africa. The Fulani + Bulala are as far apart from some of the other African samples as they are from the Europeans!
it seems to me that you end up getting African clusters that can be as far apart from each other as they are from Europeans. <
I doubt this would be the case as measured by fst. Note that distance on a principal components graph is not the same as fst: the components might be optimized to separate the clusters of choice (optimize the directions in gene space which show the most variance between the groups). It’s possible in principle that some groups (e.g., pygmies) in Africa have been as effectively separated in gene flow from other Africans as, say, Nigerians and Europeans. More likely, the fst distance between any two groups of Africans is less than the distance from the Yoruba to Europeans or E. Asians. That is what happens when you analyze the (better studied) sub-population structure of, e.g., Europe and Asia. That is, no two groups in E. Asia are anywhere near as far apart as they are collectively from Europeans (and the same for any two European groups vs distance to Asia). That’s just what you’d expect from the historical gene flow patterns, and I’d expect it to apply to Africa as well.
The real question is whether folk notions of ethnicity map onto clusters in gene space. If they do (and they do) it implies different frequency distributions for alleles in the groups. That raises the possibility of statistical group differences. What those differences are remains to be determined.
I agree on the subject of Fst; if you switch from PCA biplots to Fst, that’s going to better emphasize differences due to geographical separation. (But likely still not enough to scientifically confirm a classical racial taxonomy as the one true racial taxonomy. One would still have to decide which samples to use to build one’s Fst matrix and address the issue of how to extract racial categories from the Fst matrix. I’d also anticipate getting caught up in the same sort of issues as the structure program.)
The real question is whether folk notions of ethnicity map onto clusters in gene space.
Folk notions of ethnicity arguably could, because they are far more squishy and pliable than folk notions of race.
If they do (and they do) it implies different frequency distributions for alleles in the groups.
I can’t help feeling that you believe I’m arguing against the validity of race because I think that disproves the possibility of statistical group differences. If so, you can rest easy. I acknowledge the possibility of statistical group differences—it doesn’t live or die by the validity of race. I see (or think I do, anyway) genetic group differences in (relatively) boring traits like skin color and hair color—and if those, why not genetic group differences in drama-provoking traits like IQ, personality or genital size?
OK, so we just differ in nuances of definition. If you prefer ethnicity to race, that’s fine with me.
Well, for whatever it’s worth, I continue to disagree with one of the arguments in the blog entry I mentioned—there is more here than a minor semantic divide.
The usual lame argument is “race doesn’t exist, so how could there be group differences”—but I think neither of us is arguing that side.
So your position is that there are probably allele clusters do to cultural and geographic isolation (and therefore potentially group differences in IQ or personality) your concern is that you don’t think those clusters have been shown to map one to one with our folk racial categories?
Do you think our folk racial categories aren’t the product of observable phenotypes? Do you think those categories at least approximate a valid scientific taxonomy?
My concern (or at least the one that I’m elaborating on in this thread) is that those clusters can be made to map onto folk racial categories, or made to be only partly consistent with folk racial categories, or made to be contradictory to folk racial categories, depending upon how one’s own preconceptions of race color one’s cluster analyses.
Do you think our folk racial categories aren’t the product of observable phenotypes?
No.
Do you think those categories at least approximate a valid scientific taxonomy?
Valid for which scientific purpose? They are likely to be workable categories for a sociologist studying race relations. They are likely to be inadequate categories for a molecular anthropologist studying human genetic variation. Though I expect some molecular anthropologists (and evidently at least one professor of physics) would dispute that.
I’ve already written more than I meant to—sorry for the lecture
Here of all places this is unnecessary. I posted the link specifically hoping someone would respond like this.
It’s true that scientists can extract clusters from genetic data that match what we call races. If you gave me a bunch of human genotypes sampled from around the world and let me fuck around with that data and run it through PCA for a few hours, I’m sure I could do the same. But it doesn’t automatically follow that my classification is correct.
If we treat the widely separated clusters as races, we don’t automatically recover a black race and a white race. We end up with a Mandenka race, a white race, and a Bantu + Yoruba race, with African-Americans smeared out between them.
If we’re discovering clusters that don’t fit with our racial preconceptions that is evidence the clusters that do match some of our racial preconceptions aren’t bullshit. Also, aren’t we looking for genetic evidence of cultural and geographical isolation? Isn’t the fact that we see different clusters for different groups in Africa just evidence that those groups have been (reproductively) isolated for a really long time? I would predict from these findings that when humans first left the continent there were already distinct groupings and that not all of these grouping had descendants that left Africa.
Also, from the chart posted here I would predict that the Africans kidnapped and purchased as slaves came more from the Yoruba and much less so from the Mandenka. They probably didn’t all come from the Yoruba, perhaps the others came from the groups in the upper right corner of this chart that you linked in your other comment. Or perhaps they didn’t come from the Yoruba but others in that corner and the Yoruba are just closely related to those other groups.
EDIT: So there were a lot of tribes that had members become slaves. Like nearly every major tribe appears to have been affected. I’m going to have to find something that tells me proportions which will take longer.
From your other comment on that chart.
The Fulani + Bulala are as far apart from some of the other African samples as they are from the Europeans!
If you go search for pictures of both you can notice the phenotype differences as well.
Here of all places this is unnecessary. I posted the link specifically hoping someone would respond like this.
Mission accomplished! :-)
If we’re discovering clusters that don’t fit with our racial preconceptions that is evidence the clusters that do match some of our racial preconceptions aren’t bullshit.
Sounds reasonable.
Also, aren’t we looking for genetic evidence of cultural and geographical isolation? Isn’t the fact that we see different clusters for different groups in Africa just evidence that those groups have been (reproductively) isolated for a really long time?
It can be, although variation along principal component axes can also represent genetic change due to migration. (I picked up on this potential confound by reading a Nature Genetics paper that made the same point from the opposite direction. That is, variation along a PC can be due to continuous geographic separation instead of migration.)
Also, from the chart posted here I would predict that the Africans kidnapped and purchased as slaves came more from the Yoruba and much less so from the Mandenka.
That’s looks about right to me. Table 1 from the paper estimating African ancestry gives a detailed breakdown of the African ancestry of the African-American sample, and it fits what you suggest.
Not really a reply to you. I just found this and needed to put it somewhere. Anyone who has been following this discussion will be interested. It’s an interesting way of posing the question.
...
...
Hsu’s blog post makes two claims about race. The first argument is that ‘Hypothesis 2’ could be correct—i.e., that there could be genetically driven differences in exciting traits like IQ between races (or ‘groups,’ but I think we all know which ‘groups’ we’re really interested in). I agree with this argument.
I completely disagree with the second claim, which is that genetic clustering studies constitute ‘the scientific basis for race.’ It’s true that scientists can extract clusters from genetic data that match what we call races. If you gave me a bunch of human genotypes sampled from around the world and let me fuck around with that data and run it through PCA for a few hours, I’m sure I could do the same. But it doesn’t automatically follow that my classification is correct.
For example, if you sample some whites, sample some blacks, and expect those two categories to automatically pop out of your analysis, you might be surprised. Here’s a recent paper that estimated the European ancestry in African-Americans by analyzing genotypes from samples of US whites, US blacks, and several subgroups of Africans. Running PCA on all of the genotype data, and plotting the first two principal components of the subjects’ genotypes in each sample gave these clusters:
If we treat the widely separated clusters as races, we don’t automatically recover a black race and a white race. We end up with a Mandenka race, a white race, and a Bantu + Yoruba race, with African-Americans smeared out between them.
The researchers could no doubt have come up with an alternative rotation of the axes that would’ve projected all of the African samples on top of each other, and the European sample far away from them. But what would justify the alternative projection over the original one?
Maybe my own personal concept of ‘race’ emphasizes differences among sub-Saharan Africans, instead of continental differences. Then I might do a PCA on a set of sub-Saharan African genotypes, find a couple of principal components that best separate out the sub-Saharan African subgroups, and only then plot the north Africans and non-Africans along with the sub-Saharans.
Here are a few plots from a study that did just that. Notice now that the most widely separated clusters are three, or perhaps four, sub-Saharan African clusters—and the rest of the world forms one little cluster in the middle of them!
If I were a scientist who had started with the idea that the main races consisted of several African subgroups, plus one other race containing all non-Africans, this analysis would seem to completely vindicate my initial beliefs! But the analysis turned out the way it did mainly because the way I did it was driven by my original taxonomy of ‘races.’
I’ve picked out two papers myself to make points, now I’ll write a bit about the ‘Risch et al.’ paper Hsu points to. Risch et al. calculated genetic clusters by running data collected for the Family Blood Pressure Program through the structure program. Hsu writes that the clusters that emerged ‘correspond very well to self-identified notions of race.’
Well, there’s no ready-made algorithm which takes genotypes as input and spits out objectively determined races, and structure is no exception. There are some subtleties to how the program works. For one thing, it doesn’t automatically confirm an optimal number of clusters and then sort the subjects into the appropriate number of clusters: the researcher tells structure to put subjects into some number k of clusters, and the program then does its best to fit the subjects into k clusters. So the fact that structure’s output contained an intuitively pleasing number of clusters doesn’t mean very much.
Another issue is that the kind of model structure uses to represent distributions of genotypes is suboptimal for cases where samples have been isolated due to distance and have suffered a lack of gene flow. But, if Hsu is correct, this is exactly the case for Risch et al.‘s data, since he writes that Risch et al.‘s ‘clustering is a natural consequence of geographical isolation, inheritance and natural selection operating over the last 50k years since humans left Africa!’
There is more I could write, but I might as well just link this book chapter, which discusses issues with trying to algorithmically infer someone’s racial ancestry. I’ve already written more than I meant to—sorry for the lecture—but it disappoints me when someone well-credentialed (a professor of physics!) uncritically waves around ambiguous results to shore up a folk model of race.
(Edited to fix last link.)
I’m typing this on an iPad so apologies for mistakes. A picture for you here:
http://infoproc.blogspot.com/2009/06/genetic-clustering-40-years-of-progress.html
Yes, there are clines, but so what? The population fraction in the clinal region between the major groups is tiny.
The distance (e.g. measured by fst) between the continental groups is so large that you would have to stand on your head to not “discover” those as separate clusters.
See also here http://infoproc.blogspot.com/2008/11/human-genetic-variation-fst-and.html
I’m not sure that this contradicts what I wrote. I acknowledge that high-resolution genotyping enables one to distinguish geographically distant samples of people. Being able to pull that off does not automatically validate ‘race,’ as in the conventional white people v. yellow people v. brown people v. red people taxonomy.
Or you need only come at the data with an unusual preconception of race, which would affect your analytic approach.
Also, if you take wide-ranging genetic samples across Africa (as opposed to using a handful of samples from one Nigerian city to represent all of Africa, as seems to have been done to derive your picture), it seems to me that you end up getting African clusters that can be as far apart from each other as they are from Europeans.
Another example: check out subdiagram A in this diagram, from a paper that took samples from West and South Africa. The Fulani + Bulala are as far apart from some of the other African samples as they are from the Europeans!
I doubt this would be the case as measured by fst. Note that distance on a principal components graph is not the same as fst: the components might be optimized to separate the clusters of choice (optimize the directions in gene space which show the most variance between the groups). It’s possible in principle that some groups (e.g., pygmies) in Africa have been as effectively separated in gene flow from other Africans as, say, Nigerians and Europeans. More likely, the fst distance between any two groups of Africans is less than the distance from the Yoruba to Europeans or E. Asians. That is what happens when you analyze the (better studied) sub-population structure of, e.g., Europe and Asia. That is, no two groups in E. Asia are anywhere near as far apart as they are collectively from Europeans (and the same for any two European groups vs distance to Asia). That’s just what you’d expect from the historical gene flow patterns, and I’d expect it to apply to Africa as well.
The real question is whether folk notions of ethnicity map onto clusters in gene space. If they do (and they do) it implies different frequency distributions for alleles in the groups. That raises the possibility of statistical group differences. What those differences are remains to be determined.
I agree on the subject of Fst; if you switch from PCA biplots to Fst, that’s going to better emphasize differences due to geographical separation. (But likely still not enough to scientifically confirm a classical racial taxonomy as the one true racial taxonomy. One would still have to decide which samples to use to build one’s Fst matrix and address the issue of how to extract racial categories from the Fst matrix. I’d also anticipate getting caught up in the same sort of issues as the structure program.)
Folk notions of ethnicity arguably could, because they are far more squishy and pliable than folk notions of race.
I can’t help feeling that you believe I’m arguing against the validity of race because I think that disproves the possibility of statistical group differences. If so, you can rest easy. I acknowledge the possibility of statistical group differences—it doesn’t live or die by the validity of race. I see (or think I do, anyway) genetic group differences in (relatively) boring traits like skin color and hair color—and if those, why not genetic group differences in drama-provoking traits like IQ, personality or genital size?
OK, so we just differ in nuances of definition. If you prefer ethnicity to race, that’s fine with me.
The usual lame argument is “race doesn’t exist, so how could there be group differences”—but I think neither of us is arguing that side.
Well, for whatever it’s worth, I continue to disagree with one of the arguments in the blog entry I mentioned—there is more here than a minor semantic divide.
Correct.
So your position is that there are probably allele clusters do to cultural and geographic isolation (and therefore potentially group differences in IQ or personality) your concern is that you don’t think those clusters have been shown to map one to one with our folk racial categories?
Do you think our folk racial categories aren’t the product of observable phenotypes? Do you think those categories at least approximate a valid scientific taxonomy?
My concern (or at least the one that I’m elaborating on in this thread) is that those clusters can be made to map onto folk racial categories, or made to be only partly consistent with folk racial categories, or made to be contradictory to folk racial categories, depending upon how one’s own preconceptions of race color one’s cluster analyses.
No.
Valid for which scientific purpose? They are likely to be workable categories for a sociologist studying race relations. They are likely to be inadequate categories for a molecular anthropologist studying human genetic variation. Though I expect some molecular anthropologists (and evidently at least one professor of physics) would dispute that.
Here of all places this is unnecessary. I posted the link specifically hoping someone would respond like this.
If we’re discovering clusters that don’t fit with our racial preconceptions that is evidence the clusters that do match some of our racial preconceptions aren’t bullshit. Also, aren’t we looking for genetic evidence of cultural and geographical isolation? Isn’t the fact that we see different clusters for different groups in Africa just evidence that those groups have been (reproductively) isolated for a really long time? I would predict from these findings that when humans first left the continent there were already distinct groupings and that not all of these grouping had descendants that left Africa.
Also, from the chart posted here I would predict that the Africans kidnapped and purchased as slaves came more from the Yoruba and much less so from the Mandenka. They probably didn’t all come from the Yoruba, perhaps the others came from the groups in the upper right corner of this chart that you linked in your other comment. Or perhaps they didn’t come from the Yoruba but others in that corner and the Yoruba are just closely related to those other groups.
EDIT: So there were a lot of tribes that had members become slaves. Like nearly every major tribe appears to have been affected. I’m going to have to find something that tells me proportions which will take longer.
From your other comment on that chart.
If you go search for pictures of both you can notice the phenotype differences as well.
I’ll maybe say more after I look at that chapter.
Mission accomplished! :-)
Sounds reasonable.
It can be, although variation along principal component axes can also represent genetic change due to migration. (I picked up on this potential confound by reading a Nature Genetics paper that made the same point from the opposite direction. That is, variation along a PC can be due to continuous geographic separation instead of migration.)
That’s looks about right to me. Table 1 from the paper estimating African ancestry gives a detailed breakdown of the African ancestry of the African-American sample, and it fits what you suggest.