Random analysis! From the fact that they anticipate using $400 million to record and track about 4 million people, you can tell they are talking about using microarrays to log SNP profiles (like 23andme) or microsatellite repeat lengths or some otherwise cheap and easy marker-based approach rather than de novo sequencing. De novo sequencing that many people would be much more human DNA sequence data than has ever been produced in the history of the world, would clog up the current world complement of high throughput sequencers for a long time, would be no more useful for legal purposes, and probably cost $40 billion + (probably more to develop infrastructure).
Iceland has managed to guess the complete sequence for all of its residents from SNPs by getting complete sequences of 3%. (Not that crime-fighting would use anything more than SNPs.)
but you cannot “guess” the complete sequence for any particular individual.
Of course you can. If you have a giant complete pedigree for most or all of the population and you have SNPs or whole-genomes for a small fraction of the members, and especially if it’s a highly homogenous population, then you can impute full genomes with varying but still-far-better-than-whole-population-base-rate accuracy for any particular entry (person) in the family tree. They’re all highly correlated. This is no odder than noting that you can infer a lot about a parent’s genome from one or two childrens’ genomes despite never seeing the parent’s genome. Your first cousin’s genome says a lot about your genome, and even more if one can put it into a family tree and also has one of your grandparent’s genomes. And if you have all the family trees and samples from most of them...
(This will not work too well for Kuwait since while the citizens may be highly inbred, they do not have the same genealogical records, and citizens are, IIRC, outnumbered by resident foreigners who are drawn from all over the world and especially poor countries. But it does work for Iceland.)
you can impute full genomes with varying but still-better-than-whole-population-base-rate accuracy for any particular entry in the family tree.
True. But when the OP says “guess the complete sequence” I assume a much higher accuracy than just somewhat better than the base rate.
You can produce an estimate for the full sequence just on the basis of knowing that the subject is human (with some low accuracy), you can produce a better estimate if you know the subject’s race, you can produce an even better one if you know the specific ethnic background, etc. It’s still a statistical estimate and as such is quite different from actually sequencing the DNA of a specific individual.
I assume a much higher accuracy than just somewhat better than the base rate.
How much higher would that be and how do you know the Icelandic imputations do not meet your standards?
It’s still a statistical estimate and as such is quite different from actually sequencing the DNA of a specific individual.
A ‘actual’ sequence is itself a ‘statistical estimate’, since even with 30x coverage there will still be a lot of errors… (It’s statistics all the way down, is what I’m saying.) For many purposes, the imputation can be good enough. DNA databases have already shown their utility in tracking down criminals who are not sampled in it but their relatives are. From a Kuwaiti perspective, your quibbles are uninteresting.
Random analysis! From the fact that they anticipate using $400 million to record and track about 4 million people, you can tell they are talking about using microarrays to log SNP profiles (like 23andme) or microsatellite repeat lengths or some otherwise cheap and easy marker-based approach rather than de novo sequencing. De novo sequencing that many people would be much more human DNA sequence data than has ever been produced in the history of the world, would clog up the current world complement of high throughput sequencers for a long time, would be no more useful for legal purposes, and probably cost $40 billion + (probably more to develop infrastructure).
Iceland has managed to guess the complete sequence for all of its residents from SNPs by getting complete sequences of 3%. (Not that crime-fighting would use anything more than SNPs.)
Does not compute.
You can “guess” some statistical averages for the whole population, but you cannot “guess” the complete sequence for any particular individual.
Of course you can. If you have a giant complete pedigree for most or all of the population and you have SNPs or whole-genomes for a small fraction of the members, and especially if it’s a highly homogenous population, then you can impute full genomes with varying but still-far-better-than-whole-population-base-rate accuracy for any particular entry (person) in the family tree. They’re all highly correlated. This is no odder than noting that you can infer a lot about a parent’s genome from one or two childrens’ genomes despite never seeing the parent’s genome. Your first cousin’s genome says a lot about your genome, and even more if one can put it into a family tree and also has one of your grandparent’s genomes. And if you have all the family trees and samples from most of them...
(This will not work too well for Kuwait since while the citizens may be highly inbred, they do not have the same genealogical records, and citizens are, IIRC, outnumbered by resident foreigners who are drawn from all over the world and especially poor countries. But it does work for Iceland.)
All the coverage says that they used pedigrees, but I’d think that they could be reconstructed from SNPs, rather more accurately.
Throwing away data is rarely helpful.
True. But when the OP says “guess the complete sequence” I assume a much higher accuracy than just somewhat better than the base rate.
You can produce an estimate for the full sequence just on the basis of knowing that the subject is human (with some low accuracy), you can produce a better estimate if you know the subject’s race, you can produce an even better one if you know the specific ethnic background, etc. It’s still a statistical estimate and as such is quite different from actually sequencing the DNA of a specific individual.
How much higher would that be and how do you know the Icelandic imputations do not meet your standards?
A ‘actual’ sequence is itself a ‘statistical estimate’, since even with 30x coverage there will still be a lot of errors… (It’s statistics all the way down, is what I’m saying.) For many purposes, the imputation can be good enough. DNA databases have already shown their utility in tracking down criminals who are not sampled in it but their relatives are. From a Kuwaiti perspective, your quibbles are uninteresting.
You don’t look like a Kuwaiti :-P And, of course, interestingness is in the eye of the beholder...