gwern comments on Open Thread, Jul. 20 - Jul. 26, 2015

gwern 22 Jul 2015 18:40 UTC
6 points
0

but you cannot “guess” the complete sequence for any particular individual.

Of course you can. If you have a giant complete pedigree for most or all of the population and you have SNPs or whole-genomes for a small fraction of the members, and especially if it’s a highly homogenous population, then you can impute full genomes with varying but still-far-better-than-whole-population-base-rate accuracy for any particular entry (person) in the family tree. They’re all highly correlated. This is no odder than noting that you can infer a lot about a parent’s genome from one or two childrens’ genomes despite never seeing the parent’s genome. Your first cousin’s genome says a lot about your genome, and even more if one can put it into a family tree and also has one of your grandparent’s genomes. And if you have all the family trees and samples from most of them...

(This will not work too well for Kuwait since while the citizens may be highly inbred, they do not have the same genealogical records, and citizens are, IIRC, outnumbered by resident foreigners who are drawn from all over the world and especially poor countries. But it does work for Iceland.)
- Douglas_Knight 22 Jul 2015 19:31 UTC
  0 points
  0
  Parent
  All the coverage says that they used pedigrees, but I’d think that they could be reconstructed from SNPs, rather more accurately.
  - gwern 22 Jul 2015 19:35 UTC
    5 points
    0
    Parent
    Throwing away data is rarely helpful.
- Lumifer 22 Jul 2015 18:53 UTC
  0 points
  0
  Parent
  
  you can impute full genomes with varying but still-better-than-whole-population-base-rate accuracy for any particular entry in the family tree.
  
  True. But when the OP says “guess the complete sequence” I assume a much higher accuracy than just somewhat better than the base rate.
  
  You can produce an estimate for the full sequence just on the basis of knowing that the subject is human (with some low accuracy), you can produce a better estimate if you know the subject’s race, you can produce an even better one if you know the specific ethnic background, etc. It’s still a statistical estimate and as such is quite different from actually sequencing the DNA of a specific individual.
  - gwern 22 Jul 2015 19:04 UTC
    6 points
    0
    Parent
    
    I assume a much higher accuracy than just somewhat better than the base rate.
    
    How much higher would that be and how do you know the Icelandic imputations do not meet your standards?
    
    It’s still a statistical estimate and as such is quite different from actually sequencing the DNA of a specific individual.
    
    A ‘actual’ sequence is itself a ‘statistical estimate’, since even with 30x coverage there will still be a lot of errors… (It’s statistics all the way down, is what I’m saying.) For many purposes, the imputation can be good enough. DNA databases have already shown their utility in tracking down criminals who are not sampled in it but their relatives are. From a Kuwaiti perspective, your quibbles are uninteresting.
    - Lumifer 22 Jul 2015 19:29 UTC
      −3 points
      0
      Parent
      
      From a Kuwaiti perspective, your quibbles are uninteresting.
      
      You don’t look like a Kuwaiti :-P And, of course, interestingness is in the eye of the beholder...