I agree with your general point, but here is a technical comment: 23andMe is the million most common SNPs, but that is not the same as the million most common variants, because not all variation is in the form of a SNP. SNP stands for “single nucleotide polymorphism”—it means that one letter is changed while the context is unchanged. They are easy to detect because of that context, and that ease of detection is why they are used.
Another kind of variation is an insertion or a deletion. They are harder to detect, which is why 23andMe only detects three of them, ones in the BRCA gene that are common among Ashkenazi. It does not attempt to detect even the ones that are equally common among the Dutch. They are easy to detect with whole genome sequencing and they are valuable to detect because they are fairly easy to interpret: the whole protein is ruined. What the protein does and what you can do about it are harder problems, but it’s not like finding a new SNP, where it probably means nothing.
A third kind of variation is copy number variation, where there is a repetitive section of the DNA and number of repeats varies from person to person. But whole genome sequencing today is bad at such regions, at least if the number of repeats is large. A lot of people think that they are important, but the fact that they are hard to measure makes that hard to assess at this time.
I agree with your general point, but here is a technical comment: 23andMe is the million most common SNPs, but that is not the same as the million most common variants, because not all variation is in the form of a SNP. SNP stands for “single nucleotide polymorphism”—it means that one letter is changed while the context is unchanged. They are easy to detect because of that context, and that ease of detection is why they are used.
Another kind of variation is an insertion or a deletion. They are harder to detect, which is why 23andMe only detects three of them, ones in the BRCA gene that are common among Ashkenazi. It does not attempt to detect even the ones that are equally common among the Dutch. They are easy to detect with whole genome sequencing and they are valuable to detect because they are fairly easy to interpret: the whole protein is ruined. What the protein does and what you can do about it are harder problems, but it’s not like finding a new SNP, where it probably means nothing.
A third kind of variation is copy number variation, where there is a repetitive section of the DNA and number of repeats varies from person to person. But whole genome sequencing today is bad at such regions, at least if the number of repeats is large. A lot of people think that they are important, but the fact that they are hard to measure makes that hard to assess at this time.