This would first require that you find an overlap of test subjects five hundred thousand strong that will not only volunteer to have their entire genome sequenced (a SNP array could be used to cut costs if you’re willing to sacrifice the breadth of variants interrogated), but will also sit down for an hours-long professionally-administered IQ test, like the WAIS-IV (again, could use some abridged test to cut costs and increase participation rate at the expense of lower-quality data)
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
As far as delivery goes, the current state of these technologies will force you to use lipid nanoparticles because of the dangers of an inflammatory response being induced in the brain by an AAV, not to mention the risk of random cell death induction by AAVs, the causes of which are poorly understood.
I briefly looked into this and found these papers:
I asked GPT4 whether adenoviruses enter the brain:
In general, adenoviruses are not commonly known to infect the brain or cause central nervous system diseases. Most adenovirus infections remain localized to the site where they first enter the body, such as the respiratory or gastrointestinal tracts. However, in rare cases, especially in individuals with weakened immune systems, adenoviruses can potentially spread to other organs, including the brain.
I also found this paper indicating much more problematic direct effects observed in mouse studies:
We demonstrate that neural progenitor cells (NPCs) and immature dentate granule cells (DGCs) within the adult murine hippocampus are particularly sensitive to rAAV-induced cell death. Cell loss is dose dependent and nearly complete at experimentally relevant viral titers. rAAV-induced cell death is rapid and persistent, with loss of BrdU-labeled cells within 18 hr post-injection and no evidence of recovery of adult neurogenesis at 3 months post-injection.
Also:
Efficient transduction of the dentategyrus (DG)– without ablating adult neurogenesis– can be achieved by injection of rAAV2-retro serotyped virus into CA3
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
You will need to use plasmid DNA (as opposed to mRNA, which is where lipid nanoparticles currently shine) if you want to keep them non-immunogenic and avoid the same immunogenicity risks of AAVs, which will significantly reduce your transduction efficiency lest you develop another breakthrough.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
If it is serious, we could potentially deliver RNPs directly to the cells in question. I think this would be plausible to do with pretty much any delivery vector except AAVs.
I don’t really see how delivering a plasmid with the DNA for the editor will be any better than delivering mRNA. The DNA will be transcribed into the exact same mRNA you would have been delivering anyways, so if the mRNA for CRISPR triggers the innate immune system thanks to CpG motifs or something, putting it in a plasmid won’t help much.
Lipid nanoparticles, even though they’re generally much safer, still have the potential to be immunogenic or toxic following repeated doses or high enough concentrations, which is another hurdle because you will need to use them repeatedly considering the number of edits you’re wanting to make.
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
The downside is they are harder to manufacture. You can make lipid nanoparticles by literally putting 4 ingredients plus mRNA inside a flask together and shaking it. ECVs require manufacturing via human cell colonies and purification.
A simple thought experiment may change your mind about mosaicism in the brain: consider what would happen in the case of editing multiple loci (whether purposeful or accidental) that happen to play a role in a neuron’s internal clock. If you have a bunch of neurons releasing substrates that govern one’s circadian rhythm in a totally discordant manner, I’d have to imagine the outcome is that the organism’s circadian rhythm will be just as discordant. This can be extrapolated to signaling pathways in general among neurons, where again one could imagine that if every 3rd or 4th or nth neuron is receiving, processing, or releasing ligands in a different way than either the upstream or downstream neurons, the result is some discordance that is more likely to be destructive than beneficial.
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
If this turns out to be an issue, one potential solution would be to exclude edits to genes that are problematic when mosaic. But this would probably be pretty difficult to validate in an animal model so that might just kill the project.
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
I have experience attempting things like what you’re suggesting 23andMe do; I briefly ran a startup unrelated to genomics, and I also ran a genomics study at my alma mater. Both of these involved trying to get consumers or test subjects to engage with links, emails, online surveys, tests, etc., and let me be the first to tell you that this is hard for any survey longer than your average customer satisfaction survey. If 23andMe has ~14 million customers worldwide and they launch a campaign that aims to estimate the IQ scores of their extant customers using an abridged online IQ test (which would take at least ~15-20 minutes if it is at all useful), it is optimistic to think they will get even 140,000 customers to respond. This prediction has an empirical basis; 23andMe conducted a consumer experience survey in 2013 and invited the customers most likely to respond: those who were over the age of 30, had logged into their 23andMe.com account within the two‐year period prior to November 2013, were not part of any other 23andMe disease research study, and had opted to receive health results. This amounted to an anemic 20,000 customers out of its hundreds of thousands; considering 23andMe is cited to have had about ~500,000 customers in 2014, we can reasonably assume they had at least ~200,000 customers in 2013. To make our estimate of the invitation rate generous, we will say they had 200,000 customers in 2013, meaning 10% of their customers received an invitation to complete the survey. Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test. It is counterintuitive and demoralizing, but anyone who has experience with these kinds of things will tell you the same thing. If 23andMe instead asked customers to submit SAT/ACT/GRE scores, there are now many other problems to account for (other than a likely response rate of <=1% of total customer base): dishonest or otherwise unreliable reporting, selecting for things that are not intelligence like conscientiousness, openness, and socioeconomic status, the mean/standard deviation of scores being different for each year (so you’d have to calculate z-score differently based on the year participants took the tests), and the fact it is much easier to hit the ceiling on the SAT/ACT/GRE (2-3 S.D., 1 in 741 at the rarest) than it is to hit the ceiling of a reliable IQ test (4 S.D., which is about 1 in 30,000). Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ. I really hate to be so pessimistic here, but it’s important to be realistic about these kinds of things especially if you’re relying on it to play a critical role in your project’s success.
Taiwan is one of the more tenable counter-examples I also thought of to what I said, but there are still problems to overcome. In the UK biobank for example, their method of assessing “fluid intelligence”/”verbal-numerical ability” was totally abysmal. They gave participants 2 minutes to answer 13 IQ-test-esque multiple-choice questions and their score was based on the number of questions they answered correctly in the 2 minutes. I hope I don’t need to explain why this is not an adequate measure of fluid intelligence and that any IQ predictor built on that data is probably totally useless. I don’t know how Taiwan assesses intelligence in their biobank if at all, but if they do it anything like how the UK biobank did it, that data will probably end up being similarly useless. Even after the fact, there is still the problem of inadequate sample size if it’s not half a million or more, and that it will take a long time for all of this to complete by my understanding. My ultimate prediction regarding this obstacle is that in order to build an IQ predictor in a short amount of time that has enough quality data to uncover a sufficient abundance of causal alleles for intelligence, there will need to be monetary incentives for the sought hundreds of thousands of participants, actual full-scale IQ tests administered, and full genome sequencing. Again, I would be delighted to be wrong about all of this and I encourage anyone to reply with good reasons for why I might be.
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
As mentioned in my reply, I would tend to agree if your goal was to only make a few edits and thus use an AAV only once or twice to accomplish this. This has been demonstrated to be relatively safe provided the right serotype is used, and there are even FDA-approved gene delivery therapies that use AAVs in the CNS. Even in these cases though, the risk of inducing an inflammatory response or killing cells is never zero even with correct dosing and single exposure, and for your purposes you would need to use at least hundreds of AAV injections to deliver hundreds of edits, and thousands of AAV injections to deliver thousands of edits. Again, barring some breakthrough in AAVs as delivery vectors, this number of uses in a single person’s CNS practically guarantees that you will end up inducing some significant/fatal inflammatory response or cytolysis. This is without even mentioning the problems of developing immunity to the viruses and low transduction efficiency, which are another couple of breakthroughs away from being solved.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
You may find these two papers elucidating: one, two
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
This is interesting. This is the first time I’m hearing of these as they pertain to potential gene therapy applications. Here are some papers about them I found that you may find useful as you consider them as an option: one, two, three
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
To be candid with you, I was mostly just trying to play devil’s advocate regarding mosaicism. Like you mention, neurons accumulate random mutations over the lifespan anyways and it doesn’t seem to be detrimental necessarily, though one can’t disentangle the cognitive decline due to this small-scale mosaicism versus that due to aging in general. It’s also possible that having an order of magnitude increase in mosaicism (e.g., 1,000 random mutations across neurons to 10,000 random mutations across neurons) induces some phase transition in its latent perniciousness. Either way, if you solve either the transduction efficiency or immunological tolerance issues (if low transduction efficiency, just employ multiple rounds of the same edit repeatedly), mosaicism won’t be much of a problem if it was ever going to be one.
You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags,
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
a whole bunch of things that are related to independent traits other than intelligence,
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
and a whole bunch of random irrelevant alleles that made it into your selection by random chance.
If they’re irrelevant, then there’s no problem.
This is a sure-fire way to make a therapy that has no chance of working,
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test.
What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
I wouldn’t call it magic, but what makes FSIQ tests special is that they’re specifically crafted to estimate g. To your point, anything that involves intelligence (SAT, ACT, GRE, random trivia quizzes, tying your shoes) will positively correlate with g even if only weakly, but the correlations between g factor scores and full-scale IQ scores from the WAIS have been found to be >0.95, according to the same Wikipedia page you linked in a previous reply to me. Like both of us mentioned in previous replies, using imperfect proxy measures would necessitate multiplying your sample size because of diluted p-values and effect sizes, along with selecting for many things that are not intelligence. There are more details about this in my reply to gwern’s reply to me.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
This may have missed your point, you seem more concerned about selecting for unwanted covariates than ‘missing things’, which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too—but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the ‘other stuff’ (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you’d be selecting quite strongly for the ‘other stuff’, which, yeah, isn’t great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7
I am more optimistic than you here. I think it is enough to get people who have already gotten their genomes sequenced through 23&Me or some other such consumer genomics service to either take an online IQ test or submit their SAT scores. You could also cross-check this with other data such people submit to validate their answer and determine whether it is plausible.
I think this could potentially be done for a few million dollars rather than 50. In fact companies like GenomeLink.io already have these kind of third party data analysis services today.
Also, we aren’t limited to western countries. If China or Taiwan or Japan or any other country creates a good IQ predictor, it can be used for editing purposes. Ancestry doesn’t matter much for editing purposes, only for embryo selection.
Would the quality of such tests be lower than those of professionally administered IQ tests?
Of course. But sample size cures many ills.
I briefly looked into this and found these papers:
Adeno-Associated virus induces apoptosis during coinfection with adenovirus
I asked GPT4 whether adenoviruses enter the brain:
I also found this paper indicating much more problematic direct effects observed in mouse studies:
AAV ablates neurogenesis in the adult murine hippocampus
Also:
So it sounds like there are potential solutions here and this isn’t necessarily a showstopper, especially if we can derisk using animal testing in cows or pigs.
This is an update for me. I didn’t previously realize that the mRNA for a base or prime editor could itself trigger the innate immune system. I wonder how serious of a concern this would actually be?
If it is serious, we could potentially deliver RNPs directly to the cells in question. I think this would be plausible to do with pretty much any delivery vector except AAVs.
I don’t really see how delivering a plasmid with the DNA for the editor will be any better than delivering mRNA. The DNA will be transcribed into the exact same mRNA you would have been delivering anyways, so if the mRNA for CRISPR triggers the innate immune system thanks to CpG motifs or something, putting it in a plasmid won’t help much.
Yeah, one other delivery vector I’ve looked into the last couple of days are extracellular vescicles. They seem to have basically zero problems with toxicity because the body already uses them to shuttle stuff around. And you can stick peptides on their surface similar to what we proposed with lipid nanoparticles.
The downside is they are harder to manufacture. You can make lipid nanoparticles by literally putting 4 ingredients plus mRNA inside a flask together and shaking it. ECVs require manufacturing via human cell colonies and purification.
Thanks for this example. I don’t think I would be particularly worried about this in the context of off-target edits or indels (provided the distribution is similar to that of naturally occuring mutations), but I can see it potentially being an issue if the intellligence modifying alleles themselves work via regulating something like the neuron’s internal clock.
If this turns out to be an issue, one potential solution would be to exclude edits to genes that are problematic when mosaic. But this would probably be pretty difficult to validate in an animal model so that might just kill the project.
I have experience attempting things like what you’re suggesting 23andMe do; I briefly ran a startup unrelated to genomics, and I also ran a genomics study at my alma mater. Both of these involved trying to get consumers or test subjects to engage with links, emails, online surveys, tests, etc., and let me be the first to tell you that this is hard for any survey longer than your average customer satisfaction survey. If 23andMe has ~14 million customers worldwide and they launch a campaign that aims to estimate the IQ scores of their extant customers using an abridged online IQ test (which would take at least ~15-20 minutes if it is at all useful), it is optimistic to think they will get even 140,000 customers to respond. This prediction has an empirical basis; 23andMe conducted a consumer experience survey in 2013 and invited the customers most likely to respond: those who were over the age of 30, had logged into their 23andMe.com account within the two‐year period prior to November 2013, were not part of any other 23andMe disease research study, and had opted to receive health results. This amounted to an anemic 20,000 customers out of its hundreds of thousands; considering 23andMe is cited to have had about ~500,000 customers in 2014, we can reasonably assume they had at least ~200,000 customers in 2013. To make our estimate of the invitation rate generous, we will say they had 200,000 customers in 2013, meaning 10% of their customers received an invitation to complete the survey. Even out of this 10%, slightly less than 10% of that 10% responded to a 98-question survey, so a generous estimate of how many of their customers they got to take this survey is 1%. And this was just a consumer experience survey, which does not have nearly as much emotional and cognitive friction dissuading participants as something like an IQ test. It is counterintuitive and demoralizing, but anyone who has experience with these kinds of things will tell you the same thing. If 23andMe instead asked customers to submit SAT/ACT/GRE scores, there are now many other problems to account for (other than a likely response rate of <=1% of total customer base): dishonest or otherwise unreliable reporting, selecting for things that are not intelligence like conscientiousness, openness, and socioeconomic status, the mean/standard deviation of scores being different for each year (so you’d have to calculate z-score differently based on the year participants took the tests), and the fact it is much easier to hit the ceiling on the SAT/ACT/GRE (2-3 S.D., 1 in 741 at the rarest) than it is to hit the ceiling of a reliable IQ test (4 S.D., which is about 1 in 30,000). Statistical models like those involved in GWASes follow one of many simple rules: crap in, crap out. If you want to find a lot of statistically significant SNPs for intelligence and you try using a shoddy proxy like standardized test score or an incomplete IQ test score as your phenotype, your GWAS is going to end up producing a bunch of shoddy SNPs for “intelligence”. Sample size (which is still an unsolved problem for the reasons aforementioned) has the potential to make up for obtaining a low amount of SNPs that have genome-wide significance, but it won’t get rid of entangled irrelevant SNPs if you’re measuring something other than straight up full-scale IQ. I really hate to be so pessimistic here, but it’s important to be realistic about these kinds of things especially if you’re relying on it to play a critical role in your project’s success.
Taiwan is one of the more tenable counter-examples I also thought of to what I said, but there are still problems to overcome. In the UK biobank for example, their method of assessing “fluid intelligence”/”verbal-numerical ability” was totally abysmal. They gave participants 2 minutes to answer 13 IQ-test-esque multiple-choice questions and their score was based on the number of questions they answered correctly in the 2 minutes. I hope I don’t need to explain why this is not an adequate measure of fluid intelligence and that any IQ predictor built on that data is probably totally useless. I don’t know how Taiwan assesses intelligence in their biobank if at all, but if they do it anything like how the UK biobank did it, that data will probably end up being similarly useless. Even after the fact, there is still the problem of inadequate sample size if it’s not half a million or more, and that it will take a long time for all of this to complete by my understanding. My ultimate prediction regarding this obstacle is that in order to build an IQ predictor in a short amount of time that has enough quality data to uncover a sufficient abundance of causal alleles for intelligence, there will need to be monetary incentives for the sought hundreds of thousands of participants, actual full-scale IQ tests administered, and full genome sequencing. Again, I would be delighted to be wrong about all of this and I encourage anyone to reply with good reasons for why I might be.
As mentioned in my reply, I would tend to agree if your goal was to only make a few edits and thus use an AAV only once or twice to accomplish this. This has been demonstrated to be relatively safe provided the right serotype is used, and there are even FDA-approved gene delivery therapies that use AAVs in the CNS. Even in these cases though, the risk of inducing an inflammatory response or killing cells is never zero even with correct dosing and single exposure, and for your purposes you would need to use at least hundreds of AAV injections to deliver hundreds of edits, and thousands of AAV injections to deliver thousands of edits. Again, barring some breakthrough in AAVs as delivery vectors, this number of uses in a single person’s CNS practically guarantees that you will end up inducing some significant/fatal inflammatory response or cytolysis. This is without even mentioning the problems of developing immunity to the viruses and low transduction efficiency, which are another couple of breakthroughs away from being solved.
You may find these two papers elucidating: one, two
This is interesting. This is the first time I’m hearing of these as they pertain to potential gene therapy applications. Here are some papers about them I found that you may find useful as you consider them as an option: one, two, three
To be candid with you, I was mostly just trying to play devil’s advocate regarding mosaicism. Like you mention, neurons accumulate random mutations over the lifespan anyways and it doesn’t seem to be detrimental necessarily, though one can’t disentangle the cognitive decline due to this small-scale mosaicism versus that due to aging in general. It’s also possible that having an order of magnitude increase in mosaicism (e.g., 1,000 random mutations across neurons to 10,000 random mutations across neurons) induces some phase transition in its latent perniciousness. Either way, if you solve either the transduction efficiency or immunological tolerance issues (if low transduction efficiency, just employ multiple rounds of the same edit repeatedly), mosaicism won’t be much of a problem if it was ever going to be one.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
EDIT: misinterpreted what other party was saying.
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
If they’re irrelevant, then there’s no problem.
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
What if 23&me offered a $20 discount for uploading old SAT scores? I guess someone would set up a site that generates realistically distributed fake SAT scores that everyone would use. Is there a standardized format for results that would be easy to retrieve and upload but hard to fake? Eh, idk, maybe not. Could a company somehow arrange to buy the scores of consenting customers directly from the testing agency? Agree that this seems hard.
This seems unduly pessimistic to me. The whole interesting thing about g is that it’s easy to measure and correlates with tons of stuff. I’m not convinced there’s any magic about FSIQ compared to shoddier tests. There might be important stuff that FSIQ doesn’t measure very well that we’d ideally like to select/edit for, but using FSIQ is much better than nothing. Likewise, using a poor man’s IQ proxy seems much better than nothing.
I wouldn’t call it magic, but what makes FSIQ tests special is that they’re specifically crafted to estimate g. To your point, anything that involves intelligence (SAT, ACT, GRE, random trivia quizzes, tying your shoes) will positively correlate with g even if only weakly, but the correlations between g factor scores and full-scale IQ scores from the WAIS have been found to be >0.95, according to the same Wikipedia page you linked in a previous reply to me. Like both of us mentioned in previous replies, using imperfect proxy measures would necessitate multiplying your sample size because of diluted p-values and effect sizes, along with selecting for many things that are not intelligence. There are more details about this in my reply to gwern’s reply to me.
This may have missed your point, you seem more concerned about selecting for unwanted covariates than ‘missing things’, which is reasonable. I might remake the same argument by suspecting that FSIQ probably has some weird covariates too—but that seems weaker. E.g. if a proxy measure correlates with FSIQ at .7, then the ‘other stuff’ (insofar as it is heritable variation and not just noise) will also correlate with the proxy at .7, and so by selecting on this measure you’d be selecting quite strongly for the ‘other stuff’, which, yeah, isn’t great. FSIQ, insofar as it had any weird unwanted covariates, would probably much less correlated with them than .7