Me: “I don’t think this therapy as OP describes it is possible for reasons that have already been stated by HiddenPrior and other reasons”
kman: “Can you elaborate on this? We’d really appreciate the feedback.”
Considering the enormity of my response, I figured I would post it in a place that is more visible to those interested. First I’d like to express my gratitude for you and GeneSmith’s goal and motivation; I agree that without some brain-machine interface solution, intelligence enhancement is certainly the way forward for us if we’d like to not only keep up with AI, but also break through the intellectual soft caps that have led to plateauing progress in many fields as they gradually become hyper-specialized. I don’t use this website very often, but fortunately this post was sent to me by a friend and I decided to engage with it since I had the exact same ideas and motivations detailed in your post when I matriculated into a PhD program for molecular genetics. I don’t want to de-anonymize myself, so I won’t mention the exact school but it is ranked within the top 50 graduate programs in the US and has had many Nobel Prize laureates as faculty. I was very optimistic, like you two appear to be. My optimism was short-lived and I eventually dropped out of the program because I quickly learned how infeasible this all was at the time (and still is), though, and figured I could make more of an impact by exerting my influence elsewhere. With that said, I’d like to amend what I said: I don’t believe that what you want to do as described in your post is impossible, but that it is infeasible.
The first impasse that I don’t see being mentioned much is gathering adequate sample data for building a useful predictor. As you stated yourself, “it is unlikely any of the groups that currently have the data to create strong intelligence predictors (or could easily obtain it) will do so.” The reality is even worse than this; not only will the industry’s magnates not build strong predictors despite their ability to, there are armies of fervent bureaucrats, lawyers, and researchers employed all across academia, health, and governments that will actively attempt to prevent this from transpiring, which is part of the reason why we don’t have it already. Even if this were not true and you could somehow orchestrate a collection of said data, it would take a very long time and a lot of money. To tease out a sufficient abundance of causal variants for intelligence as measured by an IQ test, my guess is you’d need around 500,000 samples at the very least. This guess is based on the findings of GWASes studying other complex traits, as well as the work of those at Genomic Prediction. This would first require that you find an overlap of test subjects five hundred thousand strong that will not only volunteer to have their entire genome sequenced (a SNP array could be used to cut costs if you’re willing to sacrifice the breadth of variants interrogated), but will also sit down for an hours-long professionally-administered IQ test, like the WAIS-IV (again, could use some abridged test to cut costs and increase participation rate at the expense of lower-quality data). Ignoring that this would take years because this population of 500,000 people would come from what I’d imagine is a very sparsely distributed subset of the broader population, this would be extremely expensive, as even the cheapest professional IQ tests cost at least $100 to administer, which already renders your project (or whatever entity funds these tests) at least $50,000,000 in the hole before you’ve even began designing and conducting experiments. You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
The barriers involved in engineering the delivery and editing mechanisms are different beasts. HiddenPriors already did a satisfactory job at outlining these, though he could’ve been more elaborate. At the risk of being just as opaque, I will just give the bottom line, because a fully detailed explanation of why this is infeasible would require more text than I want to write. As far as delivery goes, the current state of these technologies will force you to use lipid nanoparticles because of the dangers of an inflammatory response being induced in the brain by an AAV, not to mention the risk of random cell death induction by AAVs, the causes of which are poorly understood. Your risk appetite for such things must be extremely low considering you do not want to sustain cell death in the brain if you want to keep subjects alive, never mind imparting intelligence enhancements. Finding an AAV serotype that does not carry these risks would be a breakthrough on its own, finding an AAV serotype or sufficiently abundant collection of AAV serotypes that are immune to neutralization following repeated exposures is another breakthrough, and finding an AAV that could encode all of the edits you want to make (obviating the need for multiple injections) is another breakthrough, and frankly is probably impossible. For the things that overcome this, you would still have to overcome the low transduction efficiency and massive costs of producing many different custom AAVs at scale, lest there is another breakthrough. As you mention, such challenges have a chance of being solved by the market eventually, though who knows when that will be if it ever happens at all. These would not be as big of issues as they are were you only wanting to make a few edits, but you want to make hundreds or thousands of edits, which necessitates using AAVs multiple times if you were to choose them which exponentially compounds the chance that these risks are realized. After having been in the field and witnessing how this type of research is performed, what goes on under the hood at research labs, how slow progress is, and the biochemistry of such vectors, my personal take on this matter is that attempting to solve all of these problems for AAVs is akin to trying to optimize a horse for the road when what you really need is a car, and that car is probably going to end up being lipid nanoparticles. They’re cheaper, safer, intrinsically much less immunogenic, and more capacious. You will need to use plasmid DNA (as opposed to mRNA, which is where lipid nanoparticles currently shine) if you want to keep them non-immunogenic and avoid the same immunogenicity risks of AAVs, which will significantly reduce your transduction efficiency lest you develop another breakthrough. Lipid nanoparticles, even though they’re generally much safer, still have the potential to be immunogenic or toxic following repeated doses or high enough concentrations, which is another hurdle because you will need to use them repeatedly considering the number of edits you’re wanting to make.
I have not even gotten to why base/prime editing as those methods exist will be problematic for the number of edits you’re wanting to make, but I will spare you because my rhetoric is getting repetitive; it basically boils down to what was already mentioned in your post, the replies, and the previous paragraph, that is that the more edits you make the much greater the chance that risks will be realized, in this case meaning things like pseudo-random off-target effects, bystander edits, and guiding RNA collisions with similar loci in the genome. I also disagree with both the OP and HiddenPriors regarding the likelihood that mosaicism will be a problem. A simple thought experiment may change your mind about mosaicism in the brain: consider what would happen in the case of editing multiple loci (whether purposeful or accidental) that happen to play a role in a neuron’s internal clock. If you have a bunch of neurons releasing substrates that govern one’s circadian rhythm in a totally discordant manner, I’d have to imagine the outcome is that the organism’s circadian rhythm will be just as discordant. This can be extrapolated to signaling pathways in general among neurons, where again one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial. Albeit an informed one, this is just a conjecture based on my knowledge of neurobiology and another case where I’d be delighted if someone could give me good reasons why I might be wrong.
For all of the reasons herein and more, it’s my personal prediction that the only ways humanity is going to get vastly smarter by artificial means is through brain machine interfaces or iterative embryo selection. There are many things in the OP that I could nitpick but that do not necessarily contribute to why I think this project is infeasible, and I don’t want to make this gargantuan reply any longer than it already is. I hope I wrote enough to give you a satisfactory answer for why I think this is infeasible; I would be glad to chat over email or discord if you would like to filter ideas through me after reading this.
This would first require that you find an overlap of test subjects five hundred thousand strong that will not only volunteer to have their entire genome sequenced (a SNP array could be used to cut costs if you’re willing to sacrifice the breadth of variants interrogated), but will also sit down for an hours-long professionally-administered IQ test, like the WAIS-IV (again, could use some abridged test to cut costs and increase participation rate at the expense of lower-quality data)
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
As far as delivery goes, the current state of these technologies will force you to use lipid nanoparticles because of the dangers of an inflammatory response being induced in the brain by an AAV, not to mention the risk of random cell death induction by AAVs, the causes of which are poorly understood.
I briefly looked into this and found these papers:
I asked GPT4 whether adenoviruses enter the brain:
In general, adenoviruses are not commonly known to infect the brain or cause central nervous system diseases. Most adenovirus infections remain localized to the site where they first enter the body, such as the respiratory or gastrointestinal tracts. However, in rare cases, especially in individuals with weakened immune systems, adenoviruses can potentially spread to other organs, including the brain.
I also found this paper indicating much more problematic direct effects observed in mouse studies:
We demonstrate that neural progenitor cells (NPCs) and immature dentate granule cells (DGCs) within the adult murine hippocampus are particularly sensitive to rAAV-induced cell death. Cell loss is dose dependent and nearly complete at experimentally relevant viral titers. rAAV-induced cell death is rapid and persistent, with loss of BrdU-labeled cells within 18 hr post-injection and no evidence of recovery of adult neurogenesis at 3 months post-injection.
Also:
Efficient transduction of the dentategyrus (DG)– without ablating adult neurogenesis– can be achieved by injection of rAAV2-retro serotyped virus into CA3
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
You will need to use plasmid DNA (as opposed to mRNA, which is where lipid nanoparticles currently shine) if you want to keep them non-immunogenic and avoid the same immunogenicity risks of AAVs, which will significantly reduce your transduction efficiency lest you develop another breakthrough.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
If it is serious, we could potentially deliver RNPs directly to the cells in question. I think this would be plausible to do with pretty much any delivery vector except AAVs.
I don’t really see how delivering a plasmid with the DNA for the editor will be any better than delivering mRNA. The DNA will be transcribed into the exact same mRNA you would have been delivering anyways, so if the mRNA for CRISPR triggers the innate immune system thanks to CpG motifs or something, putting it in a plasmid won’t help much.
Lipid nanoparticles, even though they’re generally much safer, still have the potential to be immunogenic or toxic following repeated doses or high enough concentrations, which is another hurdle because you will need to use them repeatedly considering the number of edits you’re wanting to make.
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
The downside is they are harder to manufacture. You can make lipid nanoparticles by literally putting 4 ingredients plus mRNA inside a flask together and shaking it. ECVs require manufacturing via human cell colonies and purification.
A simple thought experiment may change your mind about mosaicism in the brain: consider what would happen in the case of editing multiple loci (whether purposeful or accidental) that happen to play a role in a neuron’s internal clock. If you have a bunch of neurons releasing substrates that govern one’s circadian rhythm in a totally discordant manner, I’d have to imagine the outcome is that the organism’s circadian rhythm will be just as discordant. This can be extrapolated to signaling pathways in general among neurons, where again one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial.
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
If this turns out to be an issue, one potential solution would be to exclude edits to genes that are problematic when mosaic. But this would probably be pretty difficult to validate in an animal model so that might just kill the project.
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
I have experience attempting things like what you’re suggesting 23andMe do; I briefly ran a startup unrelated to genomics, and I also ran a genomics study at my alma mater. Both of these involved trying to get consumers or test subjects to engage with links, emails, online surveys, tests, etc., and let me be the first to tell you that this is hard for any survey longer than your average customer satisfaction survey. If 23andMe has ~14 million customers worldwide and they launch a campaign that aims to estimate the IQ scores of their extant customers using an abridged online IQ test (which would take at least ~15-20 minutes if it is at all useful), it is optimistic to think they will get even 140,000 customers to respond. This prediction has an empirical basis; 23andMe conducted a consumer experience survey in 2013 and invited the customers most likely to respond: those who were over the age of 30, had logged into their 23andMe.com account within the two‐year period prior to November 2013, were not part of any other 23andMe disease research study, and had opted to receive health results. This amounted to an anemic 20,000 customers out of its hundreds of thousands; considering 23andMe is cited to have had about ~500,000 customers in 2014, we can reasonably assume they had at least ~200,000 customers in 2013. To make our estimate of the invitation rate generous, we will say they had 200,000 customers in 2013, meaning 10% of their customers received an invitation to complete the survey. Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test. It is counterintuitive and demoralizing, but anyone who has experience with these kinds of things will tell you the same thing. If 23andMe instead asked customers to submit SAT/ACT/GRE scores, there are now many other problems to account for (other than a likely response rate of <=1% of total customer base): dishonest or otherwise unreliable reporting, selecting for things that are not intelligence like conscientiousness, openness, and socioeconomic status, the mean/standard deviation of scores being different for each year (so you’d have to calculate z-score differently based on the year participants took the tests), and the fact it is much easier to hit the ceiling on the SAT/ACT/GRE (2-3 S.D., 1 in 741 at the rarest) than it is to hit the ceiling of a reliable IQ test (4 S.D., which is about 1 in 30,000). Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ. I really hate to be so pessimistic here, but it’s important to be realistic about these kinds of things especially if you’re relying on it to play a critical role in your project’s success.
Taiwan is one of the more tenable counter-examples I also thought of to what I said, but there are still problems to overcome. In the UK biobank for example, their method of assessing “fluid intelligence”/”verbal-numerical ability” was totally abysmal. They gave participants 2 minutes to answer 13 IQ-test-esque multiple-choice questions and their score was based on the number of questions they answered correctly in the 2 minutes. I hope I don’t need to explain why this is not an adequate measure of fluid intelligence and that any IQ predictor built on that data is probably totally useless. I don’t know how Taiwan assesses intelligence in their biobank if at all, but if they do it anything like how the UK biobank did it, that data will probably end up being similarly useless. Even after the fact, there is still the problem of inadequate sample size if it’s not half a million or more, and that it will take a long time for all of this to complete by my understanding. My ultimate prediction regarding this obstacle is that in order to build an IQ predictor in a short amount of time that has enough quality data to uncover a sufficient abundance of causal alleles for intelligence, there will need to be monetary incentives for the sought hundreds of thousands of participants, actual full-scale IQ tests administered, and full genome sequencing. Again, I would be delighted to be wrong about all of this and I encourage anyone to reply with good reasons for why I might be.
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
As mentioned in my reply, I would tend to agree if your goal was to only make a few edits and thus use an AAV only once or twice to accomplish this. This has been demonstrated to be relatively safe provided the right serotype is used, and there are even FDA-approved gene delivery therapies that use AAVs in the CNS. Even in these cases though, the risk of inducing an inflammatory response or killing cells is never zero even with correct dosing and single exposure, and for your purposes you would need to use at least hundreds of AAV injections to deliver hundreds of edits, and thousands of AAV injections to deliver thousands of edits. Again, barring some breakthrough in AAVs as delivery vectors, this number of uses in a single person’s CNS practically guarantees that you will end up inducing some significant/fatal inflammatory response or cytolysis. This is without even mentioning the problems of developing immunity to the viruses and low transduction efficiency, which are another couple of breakthroughs away from being solved.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
You may find these two papers elucidating: one, two
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
This is interesting. This is the first time I’m hearing of these as they pertain to potential gene therapy applications. Here are some papers about them I found that you may find useful as you consider them as an option: one, two, three
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
To be candid with you, I was mostly just trying to play devil’s advocate regarding mosaicism. Like you mention, neurons accumulate random mutations over the lifespan anyways and it doesn’t seem to be detrimental necessarily, though one can’t disentangle the cognitive decline due to this small-scale mosaicism versus that due to aging in general. It’s also possible that having an order of magnitude increase in mosaicism (e.g., 1,000 random mutations across neurons to 10,000 random mutations across neurons) induces some phase transition in its latent perniciousness. Either way, if you solve either the transduction efficiency or immunological tolerance issues (if low transduction efficiency, just employ multiple rounds of the same edit repeatedly), mosaicism won’t be much of a problem if it was ever going to be one.
You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags,
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
a whole bunch of things that are related to independent traits other than intelligence,
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
and a whole bunch of random irrelevant alleles that made it into your selection by random chance.
If they’re irrelevant, then there’s no problem.
This is a sure-fire way to make a therapy that has no chance of working,
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test.
What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
I wouldn’t call it magic, but what makes FSIQ tests special is that they’re specifically crafted to estimate g. To your point, anything that involves intelligence (SAT, ACT, GRE, random trivia quizzes, tying your shoes) will positively correlate with g even if only weakly, but the correlations between g factor scores and full-scale IQ scores from the WAIS have been found to be >0.95, according to the same Wikipedia page you linked in a previous reply to me. Like both of us mentioned in previous replies, using imperfect proxy measures would necessitate multiplying your sample size because of diluted p-values and effect sizes, along with selecting for many things that are not intelligence. There are more details about this in my reply to gwern’s reply to me.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
This may have missed your point, you seem more concerned about selecting for unwanted covariates than ‘missing things’, which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too—but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the ‘other stuff’ (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you’d be selecting quite strongly for the ‘other stuff’, which, yeah, isn’t great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7
I might end up eating my words on the delivery problem. Something has just come out a few days ago that renewed a bit of my optimism, see here. According to the findings in this pre-print, it is possible to shield AAVs from the immune system using protein vaults that the immune system recognizes as self. It is not perfect though; although VAAV results in improved transduction efficiency even in the presence of neutralizing antibodies, it still only results in transduction of ~4% of cells if neutralizing antibodies are present. This means you’d need to cross your fingers and hope that 1) the patient doesn’t already have naturally extant neutralizing antibodies and 2) they don’t develop them over the course of the hundreds/thousands of VAAV you’re going to give them. In the paper, it is stated that AAV gets packaged in the vaults only to an extent rather than completely. So, more than likely, even if you’re injecting 99% VAAV and 1% naked AAV, if you do this 100 times you are almost sure to develop neutralizing antibodies to that 1% of naked AAV (unless they have a way to completely purify VAAV that removes all naked AAV). One way to combat the transduction problem post-innocuation though is using multiple injections of the same edit in order to approximate 100% transduction, though I’m pessimistic that this will work because there is probably a good reason that only 4% of cells were transducible; something might be different about them than the rest of cells, so you might receive diminishing transduction returns with each injection. They also still need to demonstrate that these work in vivo and that they can be routed to the CNS. Nonetheless, I’m excited to see how this shakes out.
Thanks for leaving such thorough and thoughtful feedback!
You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
It looks like the SAT isn’t correlated much with personality at all. The biggest correlation is with openness, which is unsurprising due to the correlation between openness and IQ—I figured conscientiousness might be a bit correlated, but it’s actually slightly anticorrelated, despite being correlated with GPA. So maybe it’s more that you’re measuring specific abilities as well as g (e.g. non-g components of math and verbal ability).
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
this would be extremely expensive, as even the cheapest professional IQ tests cost at least $100 to administer
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
The barriers involved in engineering the delivery and editing mechanisms are different beasts.
I do basically expect the delivery problem will gated by missing breakthroughs, since otherwise I’d expect the literature to be full of more impressive results than it actually is. (E.g. why has no one used angiopep coated LNPs to deliver editors to mouse brains, as far as I can find? I guess it doesn’t work very well? Has anyone actually tried though?)
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons:
sequence dependent off-targets can be predicted
so you can maybe avoid edits that risk catastrophic off-targets
unclear how big of a problem errors at noncoding target sites will be (though after reading some replies pointing out that regulatory binding sites are highly sensitive I’m a bit more pessimistic about this than I was)
even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
though if you restrict yourself to ABEs and are careful to avoid bystanders, your pool of variants to target has shrunk way down
I mean, your basic argument was “you’re trying to do 1000 edits, and the risks will mount with each edit you do”, which yeah, maybe I’m being too optimistic here (e.g. even if not a problem at most target sites, errors will predictably be a big deal at some target sites, and it might be hard to predict which sites with high accuracy).
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried, in the sense of how electric cars weren’t going anywhere until Musk came along and actually tried (though besides sounding crazy ambitious, maybe this analogy doesn’t really work if breakthroughs are just hard to accelerate with money, and AFAIK electric cars weren’t really held up by any big breakthroughs, just lack of scale). Getting delivery+editors down would have a ton of uses besides intelligence enhancement therapy; you could target any mono/oligo/poly-genic diseases you wanted. It doesn’t seem like the amount of effort currently being put in is concomitant with how much it would be worth, even putting ‘enhancement’ use cases aside.
one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial
My impression is neurons are really noisy, and so probably not very sensitive to small perturbations in timing / signalling characteristics. I guess things could be different if the differences are permanent rather than transient—though I also wouldn’t be surprised if there was a lot of ‘spatial’ noise/variation in neural characteristics, which the brain is able to cope with. Maybe this isn’t the sort of variation you mean. I completely agree that its more likely to be detrimental than beneficial, it’s a question of how badly detrimental.
Another thing to consider: do the causal variants additively influence an underlying lower dimensional ‘parameter space’ which then influences g (e.g. degree of expression of various proteins or characteristics downstream of that)? If this is the case, and you have a large number of causal variants per ‘parameter’, then if your cells get each edit with about the same frequency on average, then even if there’s a ton of mosaicism at the variant level there might not be much at the ‘parameter’ level. I suspect the way this would actually work out is that some cells will be easier to transfect than others (e.g. due to the geography of the extracellular space that the delivery vectors need to diffuse through), so you’ll have some cells getting more total edits than others: a mix of cells with better and worse polygenic scores, which might lead to the discordance problems you suggested if the differences are big enough.
For all of the reasons herein and more, it’s my personal prediction that the only ways humanity is going to get vastly smarter by artificial means is through brain machine interfaces or iterative embryo selection.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
The article that Wikipedia cites for that factoid, Frey & Detterman 2004, uses the National Longitudiunal Survey of Youth 1979 for its data that included the SAT and ASVAB (this is what they used to estimate IQ, so first need to find correlation between ASVAB and actual FSIQ) scores for the samples. This introduces the huge caveat that the SAT has changed drastically since this study was conducted and is likely no longer nearly as strongly correlated with g ever since 1994. This is when they began recentering scores and changing the scoring methodology, making year-to-year comparisons of scores no longer apples to apples. The real killer was their revision of the math and verbal sections to mostly include questions that “approximate more closely the skills used in college and high school work”, get rid of “contrived word problems” (e.g., the types of verbal ability questions you’d see on an IQ test), and include “real-world” problems that may be more relevant to students. Since it became more focused on assessing knowledge rather than aptitude, this rehauling of the scoring and question format made it much more closely reflect a typical academic benchmark exam rather than an assessment of general cognitive ability. This decreased its predictive power for general intelligence and increased its predictive power for high school GPA, as well as other things that correlate with high school GPA like academic effort, openness, and SES. It’s for these reasons that Mensa and other psychometrics societies stopped using SAT as an acceptable proxy for IQ unless you took it prior to 1994. I’ve taken both the SAT and ACT and I cannot imagine the ACT is much better (2004 study showed r=0.73). My guess is that the GRE would be much more correlated with general intelligence than either of the other two tests (still imperfectly so, wouldn’t put it >0.8), but then the problem is that a much smaller fraction of the population has taken the GRE and there is a large selection bias as to who takes it. Same with something like the LSAT. I still think the only way you will get away with cheaply assessing general intelligence is via an abridged IQ test such as that offered by openpsychometrics.org if it was properly normed and made to be a little longer.
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
I agree, but then you’re limiting yourself to whatever number of polymorphisms are left over after what is presumably a pseudo-arbitrary threshold, and you’d need a much larger sample size because the effect sizes and p-values of SNPs would be diluted because you’d now have many more polymorphisms contributing to the phenotype. Like you suggest, it is also a large inferential leap to assume this would exclusively result in variants that affect g. Refer to my reply to gwern for more about this.
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
Refer to the first paragraph of this reply and my reply to GeneSmith.
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons: (etc)
I agree, I think the delivery problem is a much taller mountain to climb than the editor problem. One of the reasons for this is the fact that editing is generally a tractable organic chemistry problem and delivery is almost exclusively an intractable systems biology problem. Considering the progress that precision genome editing tools have made in the past 10 years, I think it is reasonable to rely on other labs to discover ways to shave down the noxious effects of editing alone to near negligibility.
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried...etc
As you alluded to, the difference is that one thing was basically solved already. Making leaps forward in biology requires an insane amount of tedium and luck. Genius is certainly important too, but like with the editing versus delivery tractability problem, engineering things like batteries involves more tractable sub-problems than getting things to work in noisy, black box, highly variable wetware like humans.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
Frankly, I know much less about this topic than the other stuff I’ve been talking about so my opinions are less strong for BMIs, but what has made me optimistic about such things is the existence of brain implants that have cured peoples’ depression, work showing that transcranial magnetic stimulation has the potential to enhance certain cognitive domains, and existing BMIs that cure paralysis at the level of the motor cortex. Like other things I mentioned, this also seems like somewhat of a more tractable problem, considering computational neuroscience is a very math intensive field of study and AI has vast potential to assist us in figuring it out. If the problem eventually comes down to needing more and more connections, I cannot imagine it will remain a problem for long, since it sounds relatively easier to figure out how to insert more fine connections into the brain than the stuff we’ve been discussing.
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
I should mention, when I wrote this I was assuming a simple model where the causal variants for g and the ‘other stuff’ are disjoint, which is probably unrealistic—there’d be some pleiotropy.
Considering the enormity of my response, I figured I would post it in a place that is more visible to those interested. First I’d like to express my gratitude for you and GeneSmith’s goal and motivation; I agree that without some brain-machine interface solution, intelligence enhancement is certainly the way forward for us if we’d like to not only keep up with AI, but also break through the intellectual soft caps that have led to plateauing progress in many fields as they gradually become hyper-specialized. I don’t use this website very often, but fortunately this post was sent to me by a friend and I decided to engage with it since I had the exact same ideas and motivations detailed in your post when I matriculated into a PhD program for molecular genetics. I don’t want to de-anonymize myself, so I won’t mention the exact school but it is ranked within the top 50 graduate programs in the US and has had many Nobel Prize laureates as faculty. I was very optimistic, like you two appear to be. My optimism was short-lived and I eventually dropped out of the program because I quickly learned how infeasible this all was at the time (and still is), though, and figured I could make more of an impact by exerting my influence elsewhere. With that said, I’d like to amend what I said: I don’t believe that what you want to do as described in your post is impossible, but that it is infeasible.
The first impasse that I don’t see being mentioned much is gathering adequate sample data for building a useful predictor. As you stated yourself, “it is unlikely any of the groups that currently have the data to create strong intelligence predictors (or could easily obtain it) will do so.” The reality is even worse than this; not only will the industry’s magnates not build strong predictors despite their ability to, there are armies of fervent bureaucrats, lawyers, and researchers employed all across academia, health, and governments that will actively attempt to prevent this from transpiring, which is part of the reason why we don’t have it already. Even if this were not true and you could somehow orchestrate a collection of said data, it would take a very long time and a lot of money. To tease out a sufficient abundance of causal variants for intelligence as measured by an IQ test, my guess is you’d need around 500,000 samples at the very least. This guess is based on the findings of GWASes studying other complex traits, as well as the work of those at Genomic Prediction. This would first require that you find an overlap of test subjects five hundred thousand strong that will not only volunteer to have their entire genome sequenced (a SNP array could be used to cut costs if you’re willing to sacrifice the breadth of variants interrogated), but will also sit down for an hours-long professionally-administered IQ test, like the WAIS-IV (again, could use some abridged test to cut costs and increase participation rate at the expense of lower-quality data). Ignoring that this would take years because this population of 500,000 people would come from what I’d imagine is a very sparsely distributed subset of the broader population, this would be extremely expensive, as even the cheapest professional IQ tests cost at least $100 to administer, which already renders your project (or whatever entity funds these tests) at least $50,000,000 in the hole before you’ve even began designing and conducting experiments. You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
The barriers involved in engineering the delivery and editing mechanisms are different beasts. HiddenPriors already did a satisfactory job at outlining these, though he could’ve been more elaborate. At the risk of being just as opaque, I will just give the bottom line, because a fully detailed explanation of why this is infeasible would require more text than I want to write. As far as delivery goes, the current state of these technologies will force you to use lipid nanoparticles because of the dangers of an inflammatory response being induced in the brain by an AAV, not to mention the risk of random cell death induction by AAVs, the causes of which are poorly understood. Your risk appetite for such things must be extremely low considering you do not want to sustain cell death in the brain if you want to keep subjects alive, never mind imparting intelligence enhancements. Finding an AAV serotype that does not carry these risks would be a breakthrough on its own, finding an AAV serotype or sufficiently abundant collection of AAV serotypes that are immune to neutralization following repeated exposures is another breakthrough, and finding an AAV that could encode all of the edits you want to make (obviating the need for multiple injections) is another breakthrough, and frankly is probably impossible. For the things that overcome this, you would still have to overcome the low transduction efficiency and massive costs of producing many different custom AAVs at scale, lest there is another breakthrough. As you mention, such challenges have a chance of being solved by the market eventually, though who knows when that will be if it ever happens at all. These would not be as big of issues as they are were you only wanting to make a few edits, but you want to make hundreds or thousands of edits, which necessitates using AAVs multiple times if you were to choose them which exponentially compounds the chance that these risks are realized. After having been in the field and witnessing how this type of research is performed, what goes on under the hood at research labs, how slow progress is, and the biochemistry of such vectors, my personal take on this matter is that attempting to solve all of these problems for AAVs is akin to trying to optimize a horse for the road when what you really need is a car, and that car is probably going to end up being lipid nanoparticles. They’re cheaper, safer, intrinsically much less immunogenic, and more capacious. You will need to use plasmid DNA (as opposed to mRNA, which is where lipid nanoparticles currently shine) if you want to keep them non-immunogenic and avoid the same immunogenicity risks of AAVs, which will significantly reduce your transduction efficiency lest you develop another breakthrough. Lipid nanoparticles, even though they’re generally much safer, still have the potential to be immunogenic or toxic following repeated doses or high enough concentrations, which is another hurdle because you will need to use them repeatedly considering the number of edits you’re wanting to make.
I have not even gotten to why base/prime editing as those methods exist will be problematic for the number of edits you’re wanting to make, but I will spare you because my rhetoric is getting repetitive; it basically boils down to what was already mentioned in your post, the replies, and the previous paragraph, that is that the more edits you make the much greater the chance that risks will be realized, in this case meaning things like pseudo-random off-target effects, bystander edits, and guiding RNA collisions with similar loci in the genome. I also disagree with both the OP and HiddenPriors regarding the likelihood that mosaicism will be a problem. A simple thought experiment may change your mind about mosaicism in the brain: consider what would happen in the case of editing multiple loci (whether purposeful or accidental) that happen to play a role in a neuron’s internal clock. If you have a bunch of neurons releasing substrates that govern one’s circadian rhythm in a totally discordant manner, I’d have to imagine the outcome is that the organism’s circadian rhythm will be just as discordant. This can be extrapolated to signaling pathways in general among neurons, where again one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial. Albeit an informed one, this is just a conjecture based on my knowledge of neurobiology and another case where I’d be delighted if someone could give me good reasons why I might be wrong.
For all of the reasons herein and more, it’s my personal prediction that the only ways humanity is going to get vastly smarter by artificial means is through brain machine interfaces or iterative embryo selection. There are many things in the OP that I could nitpick but that do not necessarily contribute to why I think this project is infeasible, and I don’t want to make this gargantuan reply any longer than it already is. I hope I wrote enough to give you a satisfactory answer for why I think this is infeasible; I would be glad to chat over email or discord if you would like to filter ideas through me after reading this.
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
I briefly looked into this and found these papers:
Adeno-Associated virus induces apoptosis during coinfection with adenovirus
I asked GPT4 whether adenoviruses enter the brain:
I also found this paper indicating much more problematic direct effects observed in mouse studies:
AAV ablates neurogenesis in the adult murine hippocampus
Also:
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
If it is serious, we could potentially deliver RNPs directly to the cells in question. I think this would be plausible to do with pretty much any delivery vector except AAVs.
I don’t really see how delivering a plasmid with the DNA for the editor will be any better than delivering mRNA. The DNA will be transcribed into the exact same mRNA you would have been delivering anyways, so if the mRNA for CRISPR triggers the innate immune system thanks to CpG motifs or something, putting it in a plasmid won’t help much.
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
The downside is they are harder to manufacture. You can make lipid nanoparticles by literally putting 4 ingredients plus mRNA inside a flask together and shaking it. ECVs require manufacturing via human cell colonies and purification.
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
If this turns out to be an issue, one potential solution would be to exclude edits to genes that are problematic when mosaic. But this would probably be pretty difficult to validate in an animal model so that might just kill the project.
I have experience attempting things like what you’re suggesting 23andMe do; I briefly ran a startup unrelated to genomics, and I also ran a genomics study at my alma mater. Both of these involved trying to get consumers or test subjects to engage with links, emails, online surveys, tests, etc., and let me be the first to tell you that this is hard for any survey longer than your average customer satisfaction survey. If 23andMe has ~14 million customers worldwide and they launch a campaign that aims to estimate the IQ scores of their extant customers using an abridged online IQ test (which would take at least ~15-20 minutes if it is at all useful), it is optimistic to think they will get even 140,000 customers to respond. This prediction has an empirical basis; 23andMe conducted a consumer experience survey in 2013 and invited the customers most likely to respond: those who were over the age of 30, had logged into their 23andMe.com account within the two‐year period prior to November 2013, were not part of any other 23andMe disease research study, and had opted to receive health results. This amounted to an anemic 20,000 customers out of its hundreds of thousands; considering 23andMe is cited to have had about ~500,000 customers in 2014, we can reasonably assume they had at least ~200,000 customers in 2013. To make our estimate of the invitation rate generous, we will say they had 200,000 customers in 2013, meaning 10% of their customers received an invitation to complete the survey. Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test. It is counterintuitive and demoralizing, but anyone who has experience with these kinds of things will tell you the same thing. If 23andMe instead asked customers to submit SAT/ACT/GRE scores, there are now many other problems to account for (other than a likely response rate of <=1% of total customer base): dishonest or otherwise unreliable reporting, selecting for things that are not intelligence like conscientiousness, openness, and socioeconomic status, the mean/standard deviation of scores being different for each year (so you’d have to calculate z-score differently based on the year participants took the tests), and the fact it is much easier to hit the ceiling on the SAT/ACT/GRE (2-3 S.D., 1 in 741 at the rarest) than it is to hit the ceiling of a reliable IQ test (4 S.D., which is about 1 in 30,000). Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ. I really hate to be so pessimistic here, but it’s important to be realistic about these kinds of things especially if you’re relying on it to play a critical role in your project’s success.
Taiwan is one of the more tenable counter-examples I also thought of to what I said, but there are still problems to overcome. In the UK biobank for example, their method of assessing “fluid intelligence”/”verbal-numerical ability” was totally abysmal. They gave participants 2 minutes to answer 13 IQ-test-esque multiple-choice questions and their score was based on the number of questions they answered correctly in the 2 minutes. I hope I don’t need to explain why this is not an adequate measure of fluid intelligence and that any IQ predictor built on that data is probably totally useless. I don’t know how Taiwan assesses intelligence in their biobank if at all, but if they do it anything like how the UK biobank did it, that data will probably end up being similarly useless. Even after the fact, there is still the problem of inadequate sample size if it’s not half a million or more, and that it will take a long time for all of this to complete by my understanding. My ultimate prediction regarding this obstacle is that in order to build an IQ predictor in a short amount of time that has enough quality data to uncover a sufficient abundance of causal alleles for intelligence, there will need to be monetary incentives for the sought hundreds of thousands of participants, actual full-scale IQ tests administered, and full genome sequencing. Again, I would be delighted to be wrong about all of this and I encourage anyone to reply with good reasons for why I might be.
As mentioned in my reply, I would tend to agree if your goal was to only make a few edits and thus use an AAV only once or twice to accomplish this. This has been demonstrated to be relatively safe provided the right serotype is used, and there are even FDA-approved gene delivery therapies that use AAVs in the CNS. Even in these cases though, the risk of inducing an inflammatory response or killing cells is never zero even with correct dosing and single exposure, and for your purposes you would need to use at least hundreds of AAV injections to deliver hundreds of edits, and thousands of AAV injections to deliver thousands of edits. Again, barring some breakthrough in AAVs as delivery vectors, this number of uses in a single person’s CNS practically guarantees that you will end up inducing some significant/fatal inflammatory response or cytolysis. This is without even mentioning the problems of developing immunity to the viruses and low transduction efficiency, which are another couple of breakthroughs away from being solved.
You may find these two papers elucidating: one, two
This is interesting. This is the first time I’m hearing of these as they pertain to potential gene therapy applications. Here are some papers about them I found that you may find useful as you consider them as an option: one, two, three
To be candid with you, I was mostly just trying to play devil’s advocate regarding mosaicism. Like you mention, neurons accumulate random mutations over the lifespan anyways and it doesn’t seem to be detrimental necessarily, though one can’t disentangle the cognitive decline due to this small-scale mosaicism versus that due to aging in general. It’s also possible that having an order of magnitude increase in mosaicism (e.g., 1,000 random mutations across neurons to 10,000 random mutations across neurons) induces some phase transition in its latent perniciousness. Either way, if you solve either the transduction efficiency or immunological tolerance issues (if low transduction efficiency, just employ multiple rounds of the same edit repeatedly), mosaicism won’t be much of a problem if it was ever going to be one.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
EDIT: misinterpreted what other party was saying.
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
If they’re irrelevant, then there’s no problem.
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
I wouldn’t call it magic, but what makes FSIQ tests special is that they’re specifically crafted to estimate g. To your point, anything that involves intelligence (SAT, ACT, GRE, random trivia quizzes, tying your shoes) will positively correlate with g even if only weakly, but the correlations between g factor scores and full-scale IQ scores from the WAIS have been found to be >0.95, according to the same Wikipedia page you linked in a previous reply to me. Like both of us mentioned in previous replies, using imperfect proxy measures would necessitate multiplying your sample size because of diluted p-values and effect sizes, along with selecting for many things that are not intelligence. There are more details about this in my reply to gwern’s reply to me.
This may have missed your point, you seem more concerned about selecting for unwanted covariates than ‘missing things’, which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too—but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the ‘other stuff’ (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you’d be selecting quite strongly for the ‘other stuff’, which, yeah, isn’t great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7
I might end up eating my words on the delivery problem. Something has just come out a few days ago that renewed a bit of my optimism, see here. According to the findings in this pre-print, it is possible to shield AAVs from the immune system using protein vaults that the immune system recognizes as self. It is not perfect though; although VAAV results in improved transduction efficiency even in the presence of neutralizing antibodies, it still only results in transduction of ~4% of cells if neutralizing antibodies are present. This means you’d need to cross your fingers and hope that 1) the patient doesn’t already have naturally extant neutralizing antibodies and 2) they don’t develop them over the course of the hundreds/thousands of VAAV you’re going to give them. In the paper, it is stated that AAV gets packaged in the vaults only to an extent rather than completely. So, more than likely, even if you’re injecting 99% VAAV and 1% naked AAV, if you do this 100 times you are almost sure to develop neutralizing antibodies to that 1% of naked AAV (unless they have a way to completely purify VAAV that removes all naked AAV). One way to combat the transduction problem post-innocuation though is using multiple injections of the same edit in order to approximate 100% transduction, though I’m pessimistic that this will work because there is probably a good reason that only 4% of cells were transducible; something might be different about them than the rest of cells, so you might receive diminishing transduction returns with each injection. They also still need to demonstrate that these work in vivo and that they can be routed to the CNS. Nonetheless, I’m excited to see how this shakes out.
Thanks for leaving such thorough and thoughtful feedback!
The SAT is heavily g-loaded: r = .82 according to Wikipedia, so ~2/3 of the variance is coming from g, ~1/3 from other stuff (minus whatever variance is testing noise). So naively, assuming no noise and that the genetic correlations mirror the phenotype correlations, if you did embryo selection on SAT, you’d be getting .82*h_pred/sqrt(2) SDs g and .57*h_pred/sqrt(2) SDs ‘other stuff’ for every SD of selection power you exert on your embryo pool (h_pred^2 is the variance in SAT explained by the predictor, we’re dividing by sqrt(2) because sibling genotypes have ~1/2 the variance as the wider population). Which is maybe not good; maybe you don’t want that much of the ‘other stuff’, e.g. if it includes personality traits.
It looks like the SAT isn’t correlated much with personality at all. The biggest correlation is with openness, which is unsurprising due to the correlation between openness and IQ—I figured conscientiousness might be a bit correlated, but it’s actually slightly anticorrelated, despite being correlated with GPA. So maybe it’s more that you’re measuring specific abilities as well as g (e.g. non-g components of math and verbal ability).
Another thing: if you have a test for which g explains the lion’s share of the heritable variance, but there are also other traits which contribute heritable variance, and the other traits are similarly polygenic as g (similar number of causal variants), then by picking the top-N expected effect size edits, you’ll probably mostly/entirely end up editing variants which affect g. (That said, if the other traits are significantly less polygenic than g then the opposite would happen.)
Getting old SAT scores could be much cheaper, I imagine (though doing this would still be very difficult). Also, as GeneSmith pointed out we aren’t necessarily limited to western countries. Assembling a large biobank including IQ scores or a good proxy might be much cheaper and more socially permissible elsewhere.
I do basically expect the delivery problem will gated by missing breakthroughs, since otherwise I’d expect the literature to be full of more impressive results than it actually is. (E.g. why has no one used angiopep coated LNPs to deliver editors to mouse brains, as far as I can find? I guess it doesn’t work very well? Has anyone actually tried though?)
Ditto for editors, though I’m somewhat more optimistic there for a handful of reasons:
sequence dependent off-targets can be predicted
so you can maybe avoid edits that risk catastrophic off-targets
unclear how big of a problem errors at noncoding target sites will be (though after reading some replies pointing out that regulatory binding sites are highly sensitive I’m a bit more pessimistic about this than I was)
even if they are a big problem, dCas9-based ABEs have extremely low indel rates and incorrect base conversions, though bystanders are still a concern
though if you restrict yourself to ABEs and are careful to avoid bystanders, your pool of variants to target has shrunk way down
I mean, your basic argument was “you’re trying to do 1000 edits, and the risks will mount with each edit you do”, which yeah, maybe I’m being too optimistic here (e.g. even if not a problem at most target sites, errors will predictably be a big deal at some target sites, and it might be hard to predict which sites with high accuracy).
It’s not clear to me how far out the necessary breakthroughs are “by default” and how much they could be accelerated if we actually tried, in the sense of how electric cars weren’t going anywhere until Musk came along and actually tried (though besides sounding crazy ambitious, maybe this analogy doesn’t really work if breakthroughs are just hard to accelerate with money, and AFAIK electric cars weren’t really held up by any big breakthroughs, just lack of scale). Getting delivery+editors down would have a ton of uses besides intelligence enhancement therapy; you could target any mono/oligo/poly-genic diseases you wanted. It doesn’t seem like the amount of effort currently being put in is concomitant with how much it would be worth, even putting ‘enhancement’ use cases aside.
My impression is neurons are really noisy, and so probably not very sensitive to small perturbations in timing / signalling characteristics. I guess things could be different if the differences are permanent rather than transient—though I also wouldn’t be surprised if there was a lot of ‘spatial’ noise/variation in neural characteristics, which the brain is able to cope with. Maybe this isn’t the sort of variation you mean. I completely agree that its more likely to be detrimental than beneficial, it’s a question of how badly detrimental.
Another thing to consider: do the causal variants additively influence an underlying lower dimensional ‘parameter space’ which then influences g (e.g. degree of expression of various proteins or characteristics downstream of that)? If this is the case, and you have a large number of causal variants per ‘parameter’, then if your cells get each edit with about the same frequency on average, then even if there’s a ton of mosaicism at the variant level there might not be much at the ‘parameter’ level. I suspect the way this would actually work out is that some cells will be easier to transfect than others (e.g. due to the geography of the extracellular space that the delivery vectors need to diffuse through), so you’ll have some cells getting more total edits than others: a mix of cells with better and worse polygenic scores, which might lead to the discordance problems you suggested if the differences are big enough.
BMI seems harder than in-vivo editing to me. Wouldn’t you need a massive number of connections (10M+?) to even begin having any hope of making people qualitatively smarter? Wouldn’t you need to find an algorithm that the brain could ‘learn to use’ so well that it essentially becomes integrated as another cortical area or can serve as an ‘expansion card’ for existing cortical areas? Would you just end up bottlenecked by the characteristics of the human neurons (e.g. low information capacity due to noise)?
The article that Wikipedia cites for that factoid, Frey & Detterman 2004, uses the National Longitudiunal Survey of Youth 1979 for its data that included the SAT and ASVAB (this is what they used to estimate IQ, so first need to find correlation between ASVAB and actual FSIQ) scores for the samples. This introduces the huge caveat that the SAT has changed drastically since this study was conducted and is likely no longer nearly as strongly correlated with g ever since 1994. This is when they began recentering scores and changing the scoring methodology, making year-to-year comparisons of scores no longer apples to apples. The real killer was their revision of the math and verbal sections to mostly include questions that “approximate more closely the skills used in college and high school work”, get rid of “contrived word problems” (e.g., the types of verbal ability questions you’d see on an IQ test), and include “real-world” problems that may be more relevant to students. Since it became more focused on assessing knowledge rather than aptitude, this rehauling of the scoring and question format made it much more closely reflect a typical academic benchmark exam rather than an assessment of general cognitive ability. This decreased its predictive power for general intelligence and increased its predictive power for high school GPA, as well as other things that correlate with high school GPA like academic effort, openness, and SES. It’s for these reasons that Mensa and other psychometrics societies stopped using SAT as an acceptable proxy for IQ unless you took it prior to 1994. I’ve taken both the SAT and ACT and I cannot imagine the ACT is much better (2004 study showed r=0.73). My guess is that the GRE would be much more correlated with general intelligence than either of the other two tests (still imperfectly so, wouldn’t put it >0.8), but then the problem is that a much smaller fraction of the population has taken the GRE and there is a large selection bias as to who takes it. Same with something like the LSAT. I still think the only way you will get away with cheaply assessing general intelligence is via an abridged IQ test such as that offered by openpsychometrics.org if it was properly normed and made to be a little longer.
I agree, but then you’re limiting yourself to whatever number of polymorphisms are left over after what is presumably a pseudo-arbitrary threshold, and you’d need a much larger sample size because the effect sizes and p-values of SNPs would be diluted because you’d now have many more polymorphisms contributing to the phenotype. Like you suggest, it is also a large inferential leap to assume this would exclusively result in variants that affect g. Refer to my reply to gwern for more about this.
Refer to the first paragraph of this reply and my reply to GeneSmith.
I agree, I think the delivery problem is a much taller mountain to climb than the editor problem. One of the reasons for this is the fact that editing is generally a tractable organic chemistry problem and delivery is almost exclusively an intractable systems biology problem. Considering the progress that precision genome editing tools have made in the past 10 years, I think it is reasonable to rely on other labs to discover ways to shave down the noxious effects of editing alone to near negligibility.
As you alluded to, the difference is that one thing was basically solved already. Making leaps forward in biology requires an insane amount of tedium and luck. Genius is certainly important too, but like with the editing versus delivery tractability problem, engineering things like batteries involves more tractable sub-problems than getting things to work in noisy, black box, highly variable wetware like humans.
Frankly, I know much less about this topic than the other stuff I’ve been talking about so my opinions are less strong for BMIs, but what has made me optimistic about such things is the existence of brain implants that have cured peoples’ depression, work showing that transcranial magnetic stimulation has the potential to enhance certain cognitive domains, and existing BMIs that cure paralysis at the level of the motor cortex. Like other things I mentioned, this also seems like somewhat of a more tractable problem, considering computational neuroscience is a very math intensive field of study and AI has vast potential to assist us in figuring it out. If the problem eventually comes down to needing more and more connections, I cannot imagine it will remain a problem for long, since it sounds relatively easier to figure out how to insert more fine connections into the brain than the stuff we’ve been discussing.
I should mention, when I wrote this I was assuming a simple model where the causal variants for g and the ‘other stuff’ are disjoint, which is probably unrealistic—there’d be some pleiotropy.