gwern comments on johnswentworth’s Shortform

gwern 24 Oct 2024 1:19 UTC
21 points
3

With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences.

No, rare variants are no silver bullet here. There’s not a small set, there’s a larger set—there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it’s hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it’s hard to even sequence a CNV, how are you going to edit it?)

They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn’t mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it’s difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don’t help you nearly as much as their rarity hurts you.)

So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.

Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.

If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it’s a lot of ‘sand in the gears’, and once you move past the easy specks of sand, they all become their own special little snowflakes.

This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like ‘select embryos with the fewest de novo mutations’… but then you lose most of the possible variance and it’ll add little.
- Olli Savolainen 25 Oct 2024 15:19 UTC
  3 points
  0
  Parent
  So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
  Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
  That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.
  - gwern 26 Oct 2024 0:07 UTC
    4 points
    0
    Parent
    
    There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right?
    
    Right.
    
    If you are doing genome synthesis, you aren’t frustrated by the rare variant problems as much because you just aren’t putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a ‘wild’ genome nor make highly challenging edits. (This is the ‘modal genome’ baseline. I believe it has still not been statistically modeled at all.)
    
    While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)
  - johnswentworth 25 Oct 2024 16:42 UTC
    4 points
    0
    Parent
    Yeah, separate from both the proposal at top of this thread and GeneSmith’s proposal, there’s also the “make the median human genome” proposal—the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the “mutational load” model is basically correct.