“My intuition says it would be hard to mine a few million SNPs, pick the most strongly associated 9500, and have them account for less than .29 of the variance, even if there were no relationship at all.”
With sample sizes of thousands or low tens of thousands you’d get almost nothing. Going from 130k to 250k subjects took it from 0.13 to 0.29 (where the total contribution of all common additive effects is around 0.5).
Most of the top 9500 are false positives (the top 697 are genome-wide significant and contribute most of the variance explained). Larger sample sizes let you overcome noise and correctly weight the alleles with actual effects. The approach looks set to explain everything you can get (and the bulk of heritability for height and IQ) without whole genome sequencing for rare variants just by scaling up another order of magnitude.
“My intuition says it would be hard to mine a few million SNPs, pick the most strongly associated 9500, and have them account for less than .29 of the variance, even if there were no relationship at all.”
With sample sizes of thousands or low tens of thousands you’d get almost nothing. Going from 130k to 250k subjects took it from 0.13 to 0.29 (where the total contribution of all common additive effects is around 0.5).
Most of the top 9500 are false positives (the top 697 are genome-wide significant and contribute most of the variance explained). Larger sample sizes let you overcome noise and correctly weight the alleles with actual effects. The approach looks set to explain everything you can get (and the bulk of heritability for height and IQ) without whole genome sequencing for rare variants just by scaling up another order of magnitude.