Headquarters in Shenzhen, China. Raised funding of US $ 1.6 billion. Nearly 5000 employees (1000 in software development alone). More sequencing power than all of US or Europe combined. Aims to become leading platform for sequencing and bioinformatics. Soon to hit 1000 genomes sequenced per day at $ 5k cost per genome. Previous successes: participant in original Human Genome Project (1 percent), rice genome, Panda genome, Tibetan altitude adaptation, early hominid sequence, over 1000 Han genomes sequenced.
Seek 10^3 or more subjects with IQ +3 SD or higher (roughly 1 in 1000). Conveniently pre-filtered population: students invited to training camps for physics, math and informatics Olympiads. Each student ranked roughly top 5 in his or her province, roughly 100 per subject per year in China. Math ability, possibly as high as +4 SD, general intelligence probably roughly +3 SD. Randomized testing to check these estimates.
Expect full sequencing (not just SNP genotyping) of 10^5 to 10^6 individuals within next few years. (Recall, BGI should reach rate of 10^3 per day within a year!) Probably paid for by science agencies of national governments. (Total cost roughly US $1 billion or so … comparable to first genome sequenced by Human Genome Project!) IF sufficient phenotype data is collected about these individuals, will have very well-powered GWAS studies within next few years – enough statistical power to capture a good fraction of total additive variance (about .6 for intelligence).
I was struck by Hsu’s estimate of how well the selection would work if the optimistic estimates about how many alleles they find works out:
Suppose that we have 100 g-affecting loci, each with minor allele frequency (MAF) of 0.1 and an average effect of 0.02.
• Instead of using a random sperm cell to conceive, a couple might go through 500 cells and choose the one with the most + alleles.
• If all couples in the population do this, what will happen to the level of g?
In the offspring generation, the mean level of g will increase by 0.2 SDs.
• Consequences will be especially prominent at the tails …
There are some delicious discussions in there; for example, on weaknesses in previous research (this sort of discussion is why I tend to skepticism, as in my previous email on genes):
Several positive findings from candidate gene studies of g have been reported [13, 19, 34, 35, 75, 79, 88]. However, given the consistent failures to replicate these findings [41, 56, 63], it appears that most or all of these reports are false positives. The poor track record of candidate gene studies is not peculiar to research on g but rather is characteristic of research on a wide variety of traits. In retrospect this trend is not surprising. Researchers performing candidate gene studies have labored under the illusion that a lax statistical significance threshold is acceptable if the total number of tested hypotheses is small. As succinctly explained by the Wellcome Trust Case Control Consortium, however, the critical factor is not the number of tested hypotheses but rather the prior probability that any given hypothesis is correct [99]. Now consider the fact that there are more than 10^7 SNPs with a minor allele frequency (MAF) exceeding .01 in the human species [1, 52]. Any reasonable prior probability that one of these SNPs has a detectable effect on a particular phenotype must be extremely small. Since prior probabilities do not depend on the amount of data gathered, extremely strong evidence of association is required to overcome a conservative prior probability, regardless of how many loci are examined in a particular study. Theoretical calculations and practical experience have shown that any associations clearing a significance threshold of 5 × 10^−8 will attain a high posterior probability of being authentic [43, 66, 99]. Since candidate gene studies have not employed significance thresholds anywhere near this strict, statistical considerations provide a sufficient explanation for the inconsistent results under this approach.
Here’s information on Big Five and heredity:
A paper in press by de Moor and colleagues reports a GWAS of the Big Five personality traits [2]. Although these personality traits do not include g, the results are nevertheless arguably relevant. The discovery sample consisted of 17,375 adults; five in silico replication samples totaling 3,294 adults were also employed. Genome-wide significance was obtained for Openness to Experience near the RASA1 gene (p = 2.8×10^−8 ) and for Conscientiousness in the brain-expressed KATNAL2 gene (p = 4.9 × 10^−8 ). However, the replication samples did not show significant associations between the top SNPs and the personality traits, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples.
The results of the de Moor study are sobering. Compare its number of hits (one at best) with those from the initial studies of height:
The GWAS of 13,664 individuals by Weedon and colleagues uncovered seven loci showing evidence of association at p < 5 × 10^−8 , all of which were later replicated [98]; and
The GWAS of 15,821 individuals by Lettre and colleagues also uncovered seven loci showing evidence of association at p < 5 × 10^−8 , all of which were later replicated [55].
According to the comprehensive analysis of the GIANT Consortium, there are only ten common variants associated with height that account for more than 0.1% percent of the variance in that trait [53]. It appears, then, that the variants most strongly associated with the Big Five personality traits may not even account for 0.1% of trait variance.2
Apropos of that, that’s part of the Chinese effort http://duende.uoregon.edu/~hsu/talks/ggenomics.pdf :
I was struck by Hsu’s estimate of how well the selection would work if the optimistic estimates about how many alleles they find works out:
The talk isn’t really citable, but there’s “BGI Cognitive Genomics Lab: Proposal for Gene-Trait Association Study of g”
There are some delicious discussions in there; for example, on weaknesses in previous research (this sort of discussion is why I tend to skepticism, as in my previous email on genes):
Here’s information on Big Five and heredity: