Thanks for leaving such thorough and thoughtful feedback!
You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
It looks like the SAT isn’t correlated much with personality at all. The biggest correlation is with openness, which is unsurprising due to the correlation between openness and IQ—I figured conscientiousness might be a bit correlated, but it’s actually slightly anticorrelated, despite being correlated with GPA. So maybe it’s more that you’re measuring specific abilities as well as g (e.g. non-g components of math and verbal ability).
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
this would be extremely expensive, as even the cheapest professional IQ tests cost at least $100 to administer
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
The barriers involved in engineering the delivery and editing mechanisms are different beasts.
I do basically expect the delivery problem will gated by missing breakthroughs, since otherwise I’d expect the literature to be full of more impressive results than it actually is. (E.g. why has no one used angiopep coated LNPs to deliver editors to mouse brains, as far as I can find? I guess it doesn’t work very well? Has anyone actually tried though?)
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons:
sequence dependent off-targets can be predicted
so you can maybe avoid edits that risk catastrophic off-targets
unclear how big of a problem errors at noncoding target sites will be (though after reading some replies pointing out that regulatory binding sites are highly sensitive I’m a bit more pessimistic about this than I was)
even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
though if you restrict yourself to ABEs and are careful to avoid bystanders, your pool of variants to target has shrunk way down
I mean, your basic argument was “you’re trying to do 1000 edits, and the risks will mount with each edit you do”, which yeah, maybe I’m being too optimistic here (e.g. even if not a problem at most target sites, errors will predictably be a big deal at some target sites, and it might be hard to predict which sites with high accuracy).
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried, in the sense of how electric cars weren’t going anywhere until Musk came along and actually tried (though besides sounding crazy ambitious, maybe this analogy doesn’t really work if breakthroughs are just hard to accelerate with money, and AFAIK electric cars weren’t really held up by any big breakthroughs, just lack of scale). Getting delivery+editors down would have a ton of uses besides intelligence enhancement therapy; you could target any mono/oligo/poly-genic diseases you wanted. It doesn’t seem like the amount of effort currently being put in is concomitant with how much it would be worth, even putting ‘enhancement’ use cases aside.
one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial
My impression is neurons are really noisy, and so probably not very sensitive to small perturbations in timing / signalling characteristics. I guess things could be different if the differences are permanent rather than transient—though I also wouldn’t be surprised if there was a lot of ‘spatial’ noise/variation in neural characteristics, which the brain is able to cope with. Maybe this isn’t the sort of variation you mean. I completely agree that its more likely to be detrimental than beneficial, it’s a question of how badly detrimental.
Another thing to consider: do the causal variants additively influence an underlying lower dimensional ‘parameter space’ which then influences g (e.g. degree of expression of various proteins or characteristics downstream of that)? If this is the case, and you have a large number of causal variants per ‘parameter’, then if your cells get each edit with about the same frequency on average, then even if there’s a ton of mosaicism at the variant level there might not be much at the ‘parameter’ level. I suspect the way this would actually work out is that some cells will be easier to transfect than others (e.g. due to the geography of the extracellular space that the delivery vectors need to diffuse through), so you’ll have some cells getting more total edits than others: a mix of cells with better and worse polygenic scores, which might lead to the discordance problems you suggested if the differences are big enough.
For all of the reasons herein and more, it’s my personal prediction that the only ways humanity is going to get vastly smarter by artificial means is through brain machine interfaces or iterative embryo selection.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
The article that Wikipedia cites for that factoid, Frey & Detterman 2004, uses the National Longitudiunal Survey of Youth 1979 for its data that included the SAT and ASVAB (this is what they used to estimate IQ, so first need to find correlation between ASVAB and actual FSIQ) scores for the samples. This introduces the huge caveat that the SAT has changed drastically since this study was conducted and is likely no longer nearly as strongly correlated with g ever since 1994. This is when they began recentering scores and changing the scoring methodology, making year-to-year comparisons of scores no longer apples to apples. The real killer was their revision of the math and verbal sections to mostly include questions that “approximate more closely the skills used in college and high school work”, get rid of “contrived word problems” (e.g., the types of verbal ability questions you’d see on an IQ test), and include “real-world” problems that may be more relevant to students. Since it became more focused on assessing knowledge rather than aptitude, this rehauling of the scoring and question format made it much more closely reflect a typical academic benchmark exam rather than an assessment of general cognitive ability. This decreased its predictive power for general intelligence and increased its predictive power for high school GPA, as well as other things that correlate with high school GPA like academic effort, openness, and SES. It’s for these reasons that Mensa and other psychometrics societies stopped using SAT as an acceptable proxy for IQ unless you took it prior to 1994. I’ve taken both the SAT and ACT and I cannot imagine the ACT is much better (2004 study showed r=0.73). My guess is that the GRE would be much more correlated with general intelligence than either of the other two tests (still imperfectly so, wouldn’t put it >0.8), but then the problem is that a much smaller fraction of the population has taken the GRE and there is a large selection bias as to who takes it. Same with something like the LSAT. I still think the only way you will get away with cheaply assessing general intelligence is via an abridged IQ test such as that offered by openpsychometrics.org if it was properly normed and made to be a little longer.
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
I agree, but then you’re limiting yourself to whatever number of polymorphisms are left over after what is presumably a pseudo-arbitrary threshold, and you’d need a much larger sample size because the effect sizes and p-values of SNPs would be diluted because you’d now have many more polymorphisms contributing to the phenotype. Like you suggest, it is also a large inferential leap to assume this would exclusively result in variants that affect g. Refer to my reply to gwern for more about this.
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
Refer to the first paragraph of this reply and my reply to GeneSmith.
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons: (etc)
I agree, I think the delivery problem is a much taller mountain to climb than the editor problem. One of the reasons for this is the fact that editing is generally a tractable organic chemistry problem and delivery is almost exclusively an intractable systems biology problem. Considering the progress that precision genome editing tools have made in the past 10 years, I think it is reasonable to rely on other labs to discover ways to shave down the noxious effects of editing alone to near negligibility.
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried...etc
As you alluded to, the difference is that one thing was basically solved already. Making leaps forward in biology requires an insane amount of tedium and luck. Genius is certainly important too, but like with the editing versus delivery tractability problem, engineering things like batteries involves more tractable sub-problems than getting things to work in noisy, black box, highly variable wetware like humans.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
Frankly, I know much less about this topic than the other stuff I’ve been talking about so my opinions are less strong for BMIs, but what has made me optimistic about such things is the existence of brain implants that have cured peoples’ depression, work showing that transcranial magnetic stimulation has the potential to enhance certain cognitive domains, and existing BMIs that cure paralysis at the level of the motor cortex. Like other things I mentioned, this also seems like somewhat of a more tractable problem, considering computational neuroscience is a very math intensive field of study and AI has vast potential to assist us in figuring it out. If the problem eventually comes down to needing more and more connections, I cannot imagine it will remain a problem for long, since it sounds relatively easier to figure out how to insert more fine connections into the brain than the stuff we’ve been discussing.
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
I should mention, when I wrote this I was assuming a simple model where the causal variants for g and the ‘other stuff’ are disjoint, which is probably unrealistic—there’d be some pleiotropy.
Thanks for leaving such thorough and thoughtful feedback!
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
It looks like the SAT isn’t correlated much with personality at all. The biggest correlation is with openness, which is unsurprising due to the correlation between openness and IQ—I figured conscientiousness might be a bit correlated, but it’s actually slightly anticorrelated, despite being correlated with GPA. So maybe it’s more that you’re measuring specific abilities as well as g (e.g. non-g components of math and verbal ability).
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
I do basically expect the delivery problem will gated by missing breakthroughs, since otherwise I’d expect the literature to be full of more impressive results than it actually is. (E.g. why has no one used angiopep coated LNPs to deliver editors to mouse brains, as far as I can find? I guess it doesn’t work very well? Has anyone actually tried though?)
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons:
sequence dependent off-targets can be predicted
so you can maybe avoid edits that risk catastrophic off-targets
unclear how big of a problem errors at noncoding target sites will be (though after reading some replies pointing out that regulatory binding sites are highly sensitive I’m a bit more pessimistic about this than I was)
even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
though if you restrict yourself to ABEs and are careful to avoid bystanders, your pool of variants to target has shrunk way down
I mean, your basic argument was “you’re trying to do 1000 edits, and the risks will mount with each edit you do”, which yeah, maybe I’m being too optimistic here (e.g. even if not a problem at most target sites, errors will predictably be a big deal at some target sites, and it might be hard to predict which sites with high accuracy).
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried, in the sense of how electric cars weren’t going anywhere until Musk came along and actually tried (though besides sounding crazy ambitious, maybe this analogy doesn’t really work if breakthroughs are just hard to accelerate with money, and AFAIK electric cars weren’t really held up by any big breakthroughs, just lack of scale). Getting delivery+editors down would have a ton of uses besides intelligence enhancement therapy; you could target any mono/oligo/poly-genic diseases you wanted. It doesn’t seem like the amount of effort currently being put in is concomitant with how much it would be worth, even putting ‘enhancement’ use cases aside.
My impression is neurons are really noisy, and so probably not very sensitive to small perturbations in timing / signalling characteristics. I guess things could be different if the differences are permanent rather than transient—though I also wouldn’t be surprised if there was a lot of ‘spatial’ noise/variation in neural characteristics, which the brain is able to cope with. Maybe this isn’t the sort of variation you mean. I completely agree that its more likely to be detrimental than beneficial, it’s a question of how badly detrimental.
Another thing to consider: do the causal variants additively influence an underlying lower dimensional ‘parameter space’ which then influences g (e.g. degree of expression of various proteins or characteristics downstream of that)? If this is the case, and you have a large number of causal variants per ‘parameter’, then if your cells get each edit with about the same frequency on average, then even if there’s a ton of mosaicism at the variant level there might not be much at the ‘parameter’ level. I suspect the way this would actually work out is that some cells will be easier to transfect than others (e.g. due to the geography of the extracellular space that the delivery vectors need to diffuse through), so you’ll have some cells getting more total edits than others: a mix of cells with better and worse polygenic scores, which might lead to the discordance problems you suggested if the differences are big enough.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
The article that Wikipedia cites for that factoid, Frey & Detterman 2004, uses the National Longitudiunal Survey of Youth 1979 for its data that included the SAT and ASVAB (this is what they used to estimate IQ, so first need to find correlation between ASVAB and actual FSIQ) scores for the samples. This introduces the huge caveat that the SAT has changed drastically since this study was conducted and is likely no longer nearly as strongly correlated with g ever since 1994. This is when they began recentering scores and changing the scoring methodology, making year-to-year comparisons of scores no longer apples to apples. The real killer was their revision of the math and verbal sections to mostly include questions that “approximate more closely the skills used in college and high school work”, get rid of “contrived word problems” (e.g., the types of verbal ability questions you’d see on an IQ test), and include “real-world” problems that may be more relevant to students. Since it became more focused on assessing knowledge rather than aptitude, this rehauling of the scoring and question format made it much more closely reflect a typical academic benchmark exam rather than an assessment of general cognitive ability. This decreased its predictive power for general intelligence and increased its predictive power for high school GPA, as well as other things that correlate with high school GPA like academic effort, openness, and SES. It’s for these reasons that Mensa and other psychometrics societies stopped using SAT as an acceptable proxy for IQ unless you took it prior to 1994. I’ve taken both the SAT and ACT and I cannot imagine the ACT is much better (2004 study showed r=0.73). My guess is that the GRE would be much more correlated with general intelligence than either of the other two tests (still imperfectly so, wouldn’t put it >0.8), but then the problem is that a much smaller fraction of the population has taken the GRE and there is a large selection bias as to who takes it. Same with something like the LSAT. I still think the only way you will get away with cheaply assessing general intelligence is via an abridged IQ test such as that offered by openpsychometrics.org if it was properly normed and made to be a little longer.
I agree, but then you’re limiting yourself to whatever number of polymorphisms are left over after what is presumably a pseudo-arbitrary threshold, and you’d need a much larger sample size because the effect sizes and p-values of SNPs would be diluted because you’d now have many more polymorphisms contributing to the phenotype. Like you suggest, it is also a large inferential leap to assume this would exclusively result in variants that affect g. Refer to my reply to gwern for more about this.
Refer to the first paragraph of this reply and my reply to GeneSmith.
I agree, I think the delivery problem is a much taller mountain to climb than the editor problem. One of the reasons for this is the fact that editing is generally a tractable organic chemistry problem and delivery is almost exclusively an intractable systems biology problem. Considering the progress that precision genome editing tools have made in the past 10 years, I think it is reasonable to rely on other labs to discover ways to shave down the noxious effects of editing alone to near negligibility.
As you alluded to, the difference is that one thing was basically solved already. Making leaps forward in biology requires an insane amount of tedium and luck. Genius is certainly important too, but like with the editing versus delivery tractability problem, engineering things like batteries involves more tractable sub-problems than getting things to work in noisy, black box, highly variable wetware like humans.
Frankly, I know much less about this topic than the other stuff I’ve been talking about so my opinions are less strong for BMIs, but what has made me optimistic about such things is the existence of brain implants that have cured peoples’ depression, work showing that transcranial magnetic stimulation has the potential to enhance certain cognitive domains, and existing BMIs that cure paralysis at the level of the motor cortex. Like other things I mentioned, this also seems like somewhat of a more tractable problem, considering computational neuroscience is a very math intensive field of study and AI has vast potential to assist us in figuring it out. If the problem eventually comes down to needing more and more connections, I cannot imagine it will remain a problem for long, since it sounds relatively easier to figure out how to insert more fine connections into the brain than the stuff we’ve been discussing.
I should mention, when I wrote this I was assuming a simple model where the causal variants for g and the ‘other stuff’ are disjoint, which is probably unrealistic—there’d be some pleiotropy.