johnswentworth comments on johnswentworth’s Shortform

johnswentworth 23 Oct 2024 19:21 UTC
61 points
−2
A Different Gambit For Genetically Engineering Smarter Humans?
Background: Significantly Enhancing Adult Intelligence With Gene Editing, Superbabies
Epistemic Status: @GeneSmith or @sarahconstantin or @kman or someone else who knows this stuff might just tell me where the assumptions underlying this gambit are wrong.
I’ve been thinking about the proposals linked above, and asked a standard question: suppose the underlying genetic studies are Not Measuring What They Think They’re Measuring. What might they be measuring instead, how could we distinguish those possibilities, and what other strategies does that suggest?
… and after going through that exercise I mostly think the underlying studies are fine, but they’re known to not account for most of the genetic component of intelligence, and there are some very natural guesses for the biggest missing pieces, and those guesses maybe suggest different strategies.
The Baseline
Before sketching the “different gambit”, let’s talk about the baseline, i.e. the two proposals linked at top. In particular, we’ll focus on the genetics part.
GeneSmith’s plan focuses on single nucleotide polymorphisms (SNPs), i.e. places in the genome where a single base-pair sometimes differs between two humans. (This type of mutation is in contrast to things like insertions or deletions.) GeneSmith argues pretty well IMO that just engineering all the right SNPs would be sufficient to raise a human’s intelligence far beyond anything which has ever existed to date.
GeneSmith cites this Steve Hsu paper, which estimates via a simple back-the-envelope calculation that there are probably on the order of 10k relevant SNPs, each present in ~10% of the population on average, each mildly deleterious.
Conceptually, the model here is that IQ variation in the current population is driven mainly by mutation load: new mutations are introduced at a steady pace, and evolution kills off the mildly-bad ones (i.e. almost all of them) only slowly, so there’s an equilibrium with many random mildly-bad mutations. Variability in intelligence comes from mostly-additive contributions from those many mildly-bad mutations. Important point for later: the arguments behind that conceptual model generalize to some extent beyond SNPs; they’d also apply to other kinds of mutations.
What’s Missing?
Based on a quick googling, SNPs are known to not account for the majority of genetic heritability of intelligence. This source cites a couple others which supposedly upper-bound the total SNP contribution to about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method). Estimates of the genetic component of IQ tend to be 50-70%, so SNPs are about half or less.
Notably, IIRC, attempts to identify which mutations account for the rest by looking at human genetic datasets have also mostly failed to close the gap. (Though I haven’t looked closely into that piece, so this is a place where I’m at particularly high risk of being wrong.)
So what’s missing?
Guess: Copy Count Variation of Microsats/Minisats/Transposons
We’re looking for some class of genetic mutations, which wouldn’t be easy to find in current genetic datasets, have mostly-relatively-mild effects individually, are reasonably common across humans, and of which there are many in an individual genome.
Guess: sounds like variation of copy count in sequences with lots of repeats/copies, like microsatellites/minisatellites or transposons.
Most genetic sequencing for the past 20 years has been shotgun sequencing, in which we break the genome up into little pieces, sequence the little pieces, then computationally reconstruct the whole genome later. That method works particularly poorly for sequences which repeat a lot, so we have relatively poor coverage and understanding of copy counts/repeat counts for such sequences. So it’s the sort of thing which might not have already been found via sequencing datasets, even though at least half the genome consists of these sorts of sequences.
Notably, these sorts of sequences typically have unusually high mutation rates. So there’s lots of variation across humans. Also, there’s been lots of selection pressure for the effects of those mutations to be relatively mild.
What Alternative Strategies Would This Hypothesis Suggest?
With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences. So the engineering part could be quite a lot easier, if we don’t need to do different things with different copies. For instance, if the problem boils down to “get rid of live L1 transposons” or “lengthen all the XYZ repeat sequences”, that would probably be simpler engineering-wise than targeting 10k SNPs.
The flip side is that there’s more novel science to do. The main thing we’d want is deep sequencing data (i.e. sequencing where people were careful to get all those tricky high-copy parts right) with some kind of IQ score attached (or SAT, or anything else highly correlated with g-factor). Notably, we might not need a very giant dataset, as is needed for SNPs. Under (some versions of) the copy count model, there aren’t necessarily thousands of different mutations which add up to yield the roughly-normal trait distribution we see. Instead, there’s independent random copy events, which add up to a roughly-normal number of copies of something. (And the mutation mechanism makes it hard for evolution to fully suppress the copying, which is why it hasn’t been selected away; transposons are a good example.)
So, main steps:
- Get a moderate-sized dataset of deep sequenced human genomes with IQ scores attached.
- Go look at it, see if there’s something obvious like “oh hey centromere size correlates strongly with IQ!” or “oh hey transposon count correlates strongly with IQ!”
- If we find anything, go engineer that thing specifically, rather than 10k SNPs.
- gwern 24 Oct 2024 1:19 UTC
  21 points
  3
  Parent
  
  With SNPs, there’s tens of thousands of different SNPs which would each need to be targeted differently. With high copy sequences, there’s a relatively small set of different sequences.
  
  No, rare variants are no silver bullet here. There’s not a small set, there’s a larger set—there would probably be combinatorially more rare variants because there are so many ways to screw up genomes beyond the limited set of ways defined by a single-nucleotide polymorphism, which is why it’s hard to either select on or edit rare variants: they have larger (harmful) effects due to being rare, yes, and account for a large chunk of heritability, yes, but there are so many possible rare mutations that each one has only a few instances worldwide which makes them hard to estimate correctly via pure GWAS-style approaches. And they tend to be large or structural and so extremely difficult to edit safely compared to editing a single base-pair. (If it’s hard to even sequence a CNV, how are you going to edit it?)
  
  They definitely contribute a lot of the missing heritability (see GREML-KIN), but that doesn’t mean you can feasibly do much about them. If there are tens of millions of possible rare variants, across the entire population, but they are present in only a handful of individuals a piece (as estimated by the GREML-KIN variance components where the family-level accounts for a lot of variance), it’s difficult to estimate their effect to know if you want to select against or edit them in the first place. (Their larger effect sizes don’t help you nearly as much as their rarity hurts you.)
  
  So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
  
  Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
  
  If the genetic architecture had worked out otherwise, if there had instead been a lot of rare mutations which increased intelligence, then life would be a lot more convenient. Instead, it’s a lot of ‘sand in the gears’, and once you move past the easy specks of sand, they all become their own special little snowflakes.
  
  This is why rare variants are not too promising, although they are the logical place to go after you start to exhaust common SNPs. You probably have to find an alternative approach like directly modeling or predicting the pathogenicity of a rare variant from trying to understand its biological effects, which is hard to do and hard to quantify or predict progress in. (You can straightforwardly model GWAS on common SNPs and how many samples you need and what variance your PGS will get, but predicting progress of pathogenicity predictors has no convenient approach.) Similarly, you can try very broad crude approaches like ‘select embryos with the fewest de novo mutations’… but then you lose most of the possible variance and it’ll add little.
  - Olli Savolainen 25 Oct 2024 15:19 UTC
    3 points
    0
    Parent
    So this is why if you read the CNV studies and you look at the hits they identify, and how many subjects are covered by the identified hits, you find that like, maybe 2% of the cohort will have one of those specific identified hits and lose 2 IQ points or gain 2 kg of fat etc. So you can see how that would work out in embryo selection: you’d be able to avoid that loss, which is meaningful! …in a tiny fraction of all embryos. On average, you’d just sequence them all, find no known pathogenic variant, and shrug, and use the SNP PGS like usual, having gained nothing.
    Also, of course, WGS is substantially more expensive than SNP genotyping and more difficult to do on embryos.
    That is relevant in pre-implantation diagnosis for parents and gene therapy at the population level. But for Qwisatz Haderach breeding purposes those costs are immaterial. There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right? We would not be interested in the effect of the ugliness, only in getting it out.
    - gwern 26 Oct 2024 0:07 UTC
      4 points
      0
      Parent
      
      There the main bottleneck is the iteration of selection, or making synthetic genomes. Going for the most typical genome with the least amount of originality is not a technical challenge in itself, right?
      
      Right.
      
      If you are doing genome synthesis, you aren’t frustrated by the rare variant problems as much because you just aren’t putting them in in the first place; therefore, there is no need to either identify the specific ones you need to remove from a ‘wild’ genome nor make highly challenging edits. (This is the ‘modal genome’ baseline. I believe it has still not been statistically modeled at all.)
      
      While if you are doing iterated embryo selection, you can similarly rely mostly on maximizing the common SNPs, which provide many SDs of possible improvement, and where you have poor statistical guidance on a variant, simply default to trying to select out against them and move towards a quasi-modal genome. (Essentially using rare-variant count as a tiebreaker and slowly washing out all of the rare variants from your embryo-line population. You will probably wind up with a lot in the final ones anyway, but oh well.)
    - johnswentworth 25 Oct 2024 16:42 UTC
      4 points
      0
      Parent
      Yeah, separate from both the proposal at top of this thread and GeneSmith’s proposal, there’s also the “make the median human genome” proposal—the idea being that, if most of the variance in human intelligence is due to mutational load (i.e. lots of individually-rare mutations which are nearly-all slightly detrimental), then a median human genome should result in very high intelligence. The big question there is whether the “mutational load” model is basically correct.
- TsviBT 24 Oct 2024 2:09 UTC
  13 points
  1
  Parent
  I didn’t read this carefully—but it’s largely irrelevant. Adult editing probably can’t have very large effects because developmental windows have passed; but either way the core difficulty is in editor delivery. Germline engineering does not require better gene targets—the ones we already have are enough to go as far as we want. The core difficulty there is taking a stem cell and making it epigenomically competent to make a baby (i.e. make it like a natural gamete or zygote).
- Towards_Keeperhood 24 Oct 2024 13:10 UTC
  8 points
  −2
  Parent
  So what’s missing?
  I haven’t looked at any of the studies and also don’t know much about genomics so my guess might be completely wrong, but a different hypothesis that seems pretty plausible to me is:
  Most of the variance of intelligence comes from how well different genes/hyperparamets-of-the-brain can work together, rather than them having individually independent effects on intelligence. Aka e.g. as made-up specifc implausible example (I don’t know that much neuroscience), there could be different genes controlling the size, the snapse-density, and the learning/placticity-rate of cortical columns in some region and there are combinations of those hyperparameters which happen to work well and some that don’t fit quite as well.
  So this hypothesis would predict that we didn’t find the remaining genetic component for intelligence yet because we didn’t have enough data to see what clusters of genes together have good effects and we also didn’t know in what places to look for clusters.
  - johnswentworth 24 Oct 2024 17:13 UTC
    9 points
    2
    Parent
    Reasonable guess a priori, but I saw some data from GeneSmith at one point which looked like the interactions are almost always additive (i.e. no nontrivial interaction terms), at least within the distribution of today’s population. Unfortunately I don’t have a reference on hand, but you should ask GeneSmith if interested.
    - GeneSmith 25 Oct 2024 16:44 UTC
      6 points
      0
      Parent
      @towards_keeperhood yes this is correct. Most research seems to show ~80% of effects are additive.
      
      Genes are actually simpler than most people tend to think
      - kave 25 Oct 2024 18:03 UTC
        9 points
        2
        Parent
        I think Steve Hsu has written some about the evidence for additivity on his blog (Information Processing). He also talks about it a bit in section 3.1 of this paper.
        Towards_Keeperhood 26 Oct 2024 19:43 UTC
        3 points
        0
        Parent
        Thanks.
        So I only briefly read through the section of the paper, but not really sure whether it applies to my hypothesis: My hypothesis isn’t about there being gene-combinations that are useful which were selected for, but just about there being gene-combinations that coincidentally work better without there being strong selection pressure for those to quickly rise to fixation.
        (Also yeah for simpler properties like how much milk is produced I’d expect a much larger share of the variance to come from genes which have individual contributions. Also for selection-based eugenics the main relevant thing are the genes which have individual contribution. (Though if we have precise ability to do gene editing we might be able to do better and see how to tune the hyperparameters to fit well together.))
        Please let me know whether I’m missing something though.
      - Towards_Keeperhood 27 Oct 2024 17:18 UTC
        3 points
        0
        Parent
        (There might be a sorta annoying analysis one could do to test my hypothesis: On my hypothesis the correlation between the intelligence of very intelligent parents and their children would be even a bit less than on the just-independent-mutations hypothesis, because very intelligent people likely also got lucky in how their gene variants work together but those properties would unlikely to all be passed along and end up dominant.)
      - Towards_Keeperhood 26 Oct 2024 19:28 UTC
        3 points
        0
        Parent
        Thanks for confirming.
        To clarify in case I’m misunderstanding, the effects are additive among the genes explaining the part of the IQ variance which we can so far explain, and we count that as evidence that for the remaining genetically caused IQ variance the effects will also be additive?
        I didn’t look into how the data analysis in the studies was done, but on my default guess this generalization does not work well / the additivity on the currently identified SNPs isn’t significant counterevidence for my hyptohesis:
        I’d imagine that studies just correlated individual gene variants with IQ and thereby found gene variants that have independent effects on intelligence. Or did they also look at pairwise or triplet gene-variant combinations and correlated those with IQ? (There would be quite a lot of pairs, and I’m not be sure whether the current datasets are large enough to robustly identify the combinations that really have good/bad effects from false positives.)
        One would of course expect that the effects of the gene variants which have independent effects on IQ are additive.
        But overall, except if the studies did look for higher-order IQ correlations, the fact that the IQ variance we can explain so far comes from genes which have independent effects isn’t significant evidence for the remaining genetically-caused IQ variation also comes from gene variants which have independent effects, because we were bound to much rather find the genes which do have independent effects.
        (I think the above should be sufficient explanation of what I think but here’s an example to clarify my hypothesis:
        Suppose gene A has variants A1 and A2 and gene B has B1 and B2. Suppose that A1 can work well with B1 and A2 with B2, but the other interactions don’t fit together that well (like badly tuned hyperparameters) and result in lower intelligence.
        When we only look at e.g. A1 and A2, none is independently better than the other—they are uncorrelated to IQ. Studies would need to look at combinations of variants to see that e.g. A1+B1 has slight positive correlation with intelligence—and I’m doubting whether studies did that (and whether we have sufficient data to see the signal among the combinatorical explosion of possibilities), and it would be helpful if someone clarified to me briefly how studies did the data analysis.
        )
    - Towards_Keeperhood 27 Oct 2024 8:32 UTC
      3 points
      0
      Parent
      (Thanks. I don’t think this is necessarily significant evidence against my hypothesis (see my comment on GeneSmith’s comment.)
      Another confusing relevant piece of evidence I thought I throw in:
      Human intelligence seems to me to be very heavytailed. (I assume this is uncontrovertial here, just look at the greatest scientists vs great scientists.)
      If variance in intelligence was basically purely explained by mildly-delterious SNPs, this would seem a bit odd to me: If the average person had 1000SNPs, and then (using butt-numbers which might be very off) Einstein (+6.3std) had only 800 and the average theoretical physics professor (+4std) had 850, I wouldn’t expect the difference there to be that big.
      It’s a bit less surprising on the model where most people have a few strongly delterious mutations, and supergeniuses are the lucky ones that have only 1 or 0 of those.
      It’s IMO even a bit less surprising on my hypothesis where in some cases the different hyperparameters happen to work much better with each other—where supergeniuses are in some dimensions “more lucky than the base genome” (in a way that’s not necessarily easy to pass on to offspring though because the genes are interdependent, which is why the genes didn’t yet rise to fixation). But even there I’d still be pretty surprised by the heavytail.
      The heavytail of intelligence really confuses me. (Given that it doesn’t even come from sub-critical intelligence explosion dynamics.)
      - tailcalled 27 Oct 2024 8:40 UTC
        5 points
        0
        Parent
        If each deleterious mutation decreases the success rate of something by an additive constant, but you need lots of sequential successes for intellectual achievements, then intellectual formidability is ~exponentially related to deleterious variants.
        Towards_Keeperhood 27 Oct 2024 9:44 UTC
        3 points
        0
        Parent
        Yeah I know that’s why I said that if a major effect was through few significantly deleterious mutations this would be more plausible. But i feel like human intelligence is even more heavitailed than what one would predict given this hypothesis.
        ~~If you have many mutations that matter, then via central limit theorem the overall distribution will be roughly gaussian even though the individual ones are exponential.~~
        ~~(If I made a mistake maybe crunch the numbers to show me?)~~
        (initially misunderstood what you mean where i thought complete nonsense.)
        I don’t understand what you’re trying to say. Can you maybe rephrase again in more detail?
        tailcalled 27 Oct 2024 10:16 UTC
        5 points
        0
        Parent
        Suppose people’s probability of solving a task is uniformly distributed between 0 and 1. That’s a thin-tailed distribution.
        Now consider their probability of correctly solving 2 tasks in a row. That will have a sort of triangular distribution, which has more positive skewness.
        If you consider e.g. their probability of correctly solving 10 tasks in a row, then the bottom 93.3% of people will all have less than 50%, whereas e.g. the 99th percentile will have 90% chance of succeeding.
        Conjunction is one of the two fundamental ways that tasks can combine, and it tends to make the tasks harder and rapidly make the upper tail do better than the lower tail, leading to an approximately-exponential element. Another fundamental way that tasks can combine is disjunction, which leads to an exponential in the opposite direction.
        When you combine conjunctions and disjunctions, you get an approximately sigmoidal relationship. The location/x-axis-translation of this sigmoid depends on the task’s difficulty. And in practice, the “easy” side of this sigmoid can be automated or done quickly or similar, so really what matters is the “hard” side, and the hard side of a sigmoid is approximately exponential.
        Towards_Keeperhood 27 Oct 2024 11:03 UTC
        3 points
        0
        Parent
        Thanks!
        Is the following a fair paraphrasing of your main hypothesis? (I’m leaving out some subtleties with conjunctive successes, but please correct the model in that way if it’s relevant.):
        “”″
        Each deleterious mutation multiplies your probability of succeeding at a problem/thought by some constant. Let’s for simplicity say it’s 0.98 for all of them.
        Then the expected number of successes per time for a person is proportional to 0.98^num_deleterious_mutations(person).
        So the model would predict that when Person A had 10 less deleterious mutations than person B, they would on average accomplish 0.98^10 ~= 0.82 times as much in a given timeframe.
        ”″”
        I think this model makes a lot of sense, thanks!
        In itself I think it’s insufficient to explain how heavytailed human intelligence is—there were multiple cases where Einstein seems to have been able to solve problems multiple times faster than the next runner ups. But I think if you use this model in a learning setting where success means “better thinking algorithms” then if you have 10 fewer deleterious mutations it’s like having ¹⁄₀.82 longer training time, and there might also be compounding returns from having better thinking algorithms to getting more and richer updates to them.
        Not sure whether this completely deconfuses me about how heavytailed human intelligence is, but it’s a great start.
        I guess at least the heavytail is much less significant evidence for my hypothesis than I initially thought (though so far I still think my hypothesis is plausible).
- rotatingpaguro 23 Oct 2024 23:16 UTC
  3 points
  0
  Parent
  Half-informed take on “the SNPs explain a small part of the genetic variance”: maybe the regression methods are bad?
  - johnswentworth 23 Oct 2024 23:50 UTC
    3 points
    0
    Parent
    Two responses:
    It’s a pretty large part—somewhere between a third and half—just not a majority.
    I was also tracking that specific hypothesis, which was why I specifically flagged “about 25% of IQ variability (using a method which does not require identifying all the relevant SNPs, though I don’t know the details of that method)”. Again, I don’t know the method, but it sounds like it wasn’t dependent on details of the regression methods.

johnswentworth comments on johnswentworth’s Shortform

A Different Gambit For Genetically Engineering Smarter Humans?

The Baseline

What’s Missing?

Guess: Copy Count Variation of Microsats/​Minisats/​Transposons

What Alternative Strategies Would This Hypothesis Suggest?

Guess: Copy Count Variation of Microsats/Minisats/Transposons