Genome sequencing for the masses is not quite here yet :-(
A Stanford study reported that at the moment a full sequencing costs about $17,000, requires more than 100 man-hours of analysis per genome and still is “associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings.”
Yeah, sequencing is tough, especially for creatures such as we with 3 billion mostly-repetitive nucleotides. The high-throughput methods basically throw the genome into a blender and read out billions of individual ~50-100 base pair reads in one reaction with a high enough error rate such that you need about 10x coverage of the genome before you are sure you catch most sites with enough reads to make sure you aren’t making a couple million mistakes. The short read length means that repetitive sequences are particularly hard to sequence because if the read is shorter than the size of the repeat you don’t know where to map your read to. Hence why in our lab (and most labs that are doing something other than cataloguing natural variation and only deal with a few kilobases at a time) we still use the old-school Sanger sequencing, because it produces 800 base pair reads one at a time for on the order of $2 each. The highest-throughput method, Illumina, also produces many terabytes of image data per run from tiny CCDs inside the sequencer that needs to go through some epic processing into the actual sequence data.
More importantly, finding a rare or unique variant via sequencing that is something other than ‘this vital gene is broken and won’t make a protein at all’ doesn’t necessarily tell you all that much. Every one of us has about 100 new mutations that were not in our parents and a mutation is likely to have many small impacts rather than one large impact. While we know that, say, height is something like 80% heritable, the best genetic screens so far have found several hundred loci that collectively account for something like 15% of the variation. There is a LOT going on, most individual differences have tiny effects, and our methods thus far only really can find common variants with relatively large impacts. Hence 23andme using microarrays that specifically find known variants rather than actually sequencing.
Genome sequencing for the masses is not quite here yet :-(
A Stanford study reported that at the moment a full sequencing costs about $17,000, requires more than 100 man-hours of analysis per genome and still is “associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings.”
Yeah, sequencing is tough, especially for creatures such as we with 3 billion mostly-repetitive nucleotides. The high-throughput methods basically throw the genome into a blender and read out billions of individual ~50-100 base pair reads in one reaction with a high enough error rate such that you need about 10x coverage of the genome before you are sure you catch most sites with enough reads to make sure you aren’t making a couple million mistakes. The short read length means that repetitive sequences are particularly hard to sequence because if the read is shorter than the size of the repeat you don’t know where to map your read to. Hence why in our lab (and most labs that are doing something other than cataloguing natural variation and only deal with a few kilobases at a time) we still use the old-school Sanger sequencing, because it produces 800 base pair reads one at a time for on the order of $2 each. The highest-throughput method, Illumina, also produces many terabytes of image data per run from tiny CCDs inside the sequencer that needs to go through some epic processing into the actual sequence data.
More importantly, finding a rare or unique variant via sequencing that is something other than ‘this vital gene is broken and won’t make a protein at all’ doesn’t necessarily tell you all that much. Every one of us has about 100 new mutations that were not in our parents and a mutation is likely to have many small impacts rather than one large impact. While we know that, say, height is something like 80% heritable, the best genetic screens so far have found several hundred loci that collectively account for something like 15% of the variation. There is a LOT going on, most individual differences have tiny effects, and our methods thus far only really can find common variants with relatively large impacts. Hence 23andme using microarrays that specifically find known variants rather than actually sequencing.
Thanks for the details—very helpful in keeping the goshwow under control.