Marlon comments on Whole genome sequencing vs SNP genotyping

Marlon 12 Jun 2015 15:40 UTC
0 points
The main idea of WGS is having >all< the SNPs whereas you only get the most common SNPs with the SNP tests.

I’m not really sure how you would use the data from WGS (let’s say the genome is assembled too—or maybe that would cost more ?). You would probably use BLAT on your local machine to search for genes with known SNPs. I don’t think you could do anything more (finding novel SNPs is out of reach here).

I would guess the main idea would be to be able to check for new SNPs as more and more are found in the literature. However, the literature is not that easy to skim through except for the most common SNPs that are already included in the SNP tests.

Going back on the literature: for most multi-factorial diseases, you will see data coming from GWAS and linkage disequilibrium studies that will be really hard to interpret. A SNP popping up like that does not necessarily mean that you’ve got the trait associated with it.

My comment was probably not really well oriented, but I should still conclude. In my opinion, do a WGS only if you’ve got enough knowledge of bioinformatics (and I mean an engineer’s level). SNP tests are cheap and will provide you with almost everything you could get from a WGS.
- Douglas_Knight 12 Jun 2015 17:07 UTC
  5 points
  Parent
  I agree with your general point, but here is a technical comment: 23andMe is the million most common SNPs, but that is not the same as the million most common variants, because not all variation is in the form of a SNP. SNP stands for “single nucleotide polymorphism”—it means that one letter is changed while the context is unchanged. They are easy to detect because of that context, and that ease of detection is why they are used.
  
  Another kind of variation is an insertion or a deletion. They are harder to detect, which is why 23andMe only detects three of them, ones in the BRCA gene that are common among Ashkenazi. It does not attempt to detect even the ones that are equally common among the Dutch. They are easy to detect with whole genome sequencing and they are valuable to detect because they are fairly easy to interpret: the whole protein is ruined. What the protein does and what you can do about it are harder problems, but it’s not like finding a new SNP, where it probably means nothing.
  
  A third kind of variation is copy number variation, where there is a repetitive section of the DNA and number of repeats varies from person to person. But whole genome sequencing today is bad at such regions, at least if the number of repeats is large. A lot of people think that they are important, but the fact that they are hard to measure makes that hard to assess at this time.