who am I? comments on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible

who am I? 25 Dec 2023 6:58 UTC
5 points
1
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
I have experience attempting things like what you’re suggesting 23andMe do; I briefly ran a startup unrelated to genomics, and I also ran a genomics study at my alma mater. Both of these involved trying to get consumers or test subjects to engage with links, emails, online surveys, tests, etc., and let me be the first to tell you that this is hard for any survey longer than your average customer satisfaction survey. If 23andMe has ~14 million customers worldwide and they launch a campaign that aims to estimate the IQ scores of their extant customers using an abridged online IQ test (which would take at least ~15-20 minutes if it is at all useful), it is optimistic to think they will get even 140,000 customers to respond. This prediction has an empirical basis; 23andMe conducted a consumer experience survey in 2013 and invited the customers most likely to respond: those who were over the age of 30, had logged into their 23andMe.com account within the two‐year period prior to November 2013, were not part of any other 23andMe disease research study, and had opted to receive health results. This amounted to an anemic 20,000 customers out of its hundreds of thousands; considering 23andMe is cited to have had about ~500,000 customers in 2014, we can reasonably assume they had at least ~200,000 customers in 2013. To make our estimate of the invitation rate generous, we will say they had 200,000 customers in 2013, meaning 10% of their customers received an invitation to complete the survey. Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test. It is counterintuitive and demoralizing, but anyone who has experience with these kinds of things will tell you the same thing. If 23andMe instead asked customers to submit SAT/ACT/GRE scores, there are now many other problems to account for (other than a likely response rate of <=1% of total customer base): dishonest or otherwise unreliable reporting, selecting for things that are not intelligence like conscientiousness, openness, and socioeconomic status, the mean/standard deviation of scores being different for each year (so you’d have to calculate z-score differently based on the year participants took the tests), and the fact it is much easier to hit the ceiling on the SAT/ACT/GRE (2-3 S.D., 1 in 741 at the rarest) than it is to hit the ceiling of a reliable IQ test (4 S.D., which is about 1 in 30,000). Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ. I really hate to be so pessimistic here, but it’s important to be realistic about these kinds of things especially if you’re relying on it to play a critical role in your project’s success.
Taiwan is one of the more tenable counter-examples I also thought of to what I said, but there are still problems to overcome. In the UK biobank for example, their method of assessing “fluid intelligence”/”verbal-numerical ability” was totally abysmal. They gave participants 2 minutes to answer 13 IQ-test-esque multiple-choice questions and their score was based on the number of questions they answered correctly in the 2 minutes. I hope I don’t need to explain why this is not an adequate measure of fluid intelligence and that any IQ predictor built on that data is probably totally useless. I don’t know how Taiwan assesses intelligence in their biobank if at all, but if they do it anything like how the UK biobank did it, that data will probably end up being similarly useless. Even after the fact, there is still the problem of inadequate sample size if it’s not half a million or more, and that it will take a long time for all of this to complete by my understanding. My ultimate prediction regarding this obstacle is that in order to build an IQ predictor in a short amount of time that has enough quality data to uncover a sufficient abundance of causal alleles for intelligence, there will need to be monetary incentives for the sought hundreds of thousands of participants, actual full-scale IQ tests administered, and full genome sequencing. Again, I would be delighted to be wrong about all of this and I encourage anyone to reply with good reasons for why I might be.
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
As mentioned in my reply, I would tend to agree if your goal was to only make a few edits and thus use an AAV only once or twice to accomplish this. This has been demonstrated to be relatively safe provided the right serotype is used, and there are even FDA-approved gene delivery therapies that use AAVs in the CNS. Even in these cases though, the risk of inducing an inflammatory response or killing cells is never zero even with correct dosing and single exposure, and for your purposes you would need to use at least hundreds of AAV injections to deliver hundreds of edits, and thousands of AAV injections to deliver thousands of edits. Again, barring some breakthrough in AAVs as delivery vectors, this number of uses in a single person’s CNS practically guarantees that you will end up inducing some significant/fatal inflammatory response or cytolysis. This is without even mentioning the problems of developing immunity to the viruses and low transduction efficiency, which are another couple of breakthroughs away from being solved.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
You may find these two papers elucidating: one, two
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
This is interesting. This is the first time I’m hearing of these as they pertain to potential gene therapy applications. Here are some papers about them I found that you may find useful as you consider them as an option: one, two, three
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
To be candid with you, I was mostly just trying to play devil’s advocate regarding mosaicism. Like you mention, neurons accumulate random mutations over the lifespan anyways and it doesn’t seem to be detrimental necessarily, though one can’t disentangle the cognitive decline due to this small-scale mosaicism versus that due to aging in general. It’s also possible that having an order of magnitude increase in mosaicism (e.g., 1,000 random mutations across neurons to 10,000 random mutations across neurons) induces some phase transition in its latent perniciousness. Either way, if you solve either the transduction efficiency or immunological tolerance issues (if low transduction efficiency, just employ multiple rounds of the same edit repeatedly), mosaicism won’t be much of a problem if it was ever going to be one.
- gwern 25 Dec 2023 18:00 UTC
  10 points
  1
  Parent
  
  You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
  
  This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
  
  So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
  - who am I? 26 Dec 2023 5:14 UTC
    1 point
    0
    Parent
    The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
    It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
    The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
    - gwern 26 Dec 2023 22:11 UTC
      12 points
      5
      Parent
      
      The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs
      
      More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
      
      you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
      
      A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
      - who am I? 26 Dec 2023 22:45 UTC
        1 point
        −3
        Parent
        A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
        I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
        EDIT: misinterpreted what other party was saying.
        gwern 27 Dec 2023 0:14 UTC
        6 points
        2
        Parent
        
        if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags,
        
        What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
        
        a whole bunch of things that are related to independent traits other than intelligence,
        
        Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
        
        (And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
        
        and a whole bunch of random irrelevant alleles that made it into your selection by random chance.
        
        If they’re irrelevant, then there’s no problem.
        
        This is a sure-fire way to make a therapy that has no chance of working,
        
        No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
- kman 25 Dec 2023 7:36 UTC
  4 points
  0
  Parent
  Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test.
  What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
  Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ.
  This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
  - who am I? 26 Dec 2023 5:20 UTC
    1 point
    0
    Parent
    I wouldn’t call it magic, but what makes FSIQ tests special is that they’re specifically crafted to estimate g. To your point, anything that involves intelligence (SAT, ACT, GRE, random trivia quizzes, tying your shoes) will positively correlate with g even if only weakly, but the correlations between g factor scores and full-scale IQ scores from the WAIS have been found to be >0.95, according to the same Wikipedia page you linked in a previous reply to me. Like both of us mentioned in previous replies, using imperfect proxy measures would necessitate multiplying your sample size because of diluted p-values and effect sizes, along with selecting for many things that are not intelligence. There are more details about this in my reply to gwern’s reply to me.
  - kman 25 Dec 2023 18:46 UTC
    1 point
    0
    Parent
    This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
    This may have missed your point, you seem more concerned about selecting for unwanted covariates than ‘missing things’, which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too—but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the ‘other stuff’ (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you’d be selecting quite strongly for the ‘other stuff’, which, yeah, isn’t great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7