You could elect to use proxy measures like educational attainment, SAT/ACT/GRE score, most advanced math class completed, etc., but my intuition is that they are influenced by too many things other than pure g to be useful for the desired purpose. It’s possible that I’m being too cynical about this obstacle and I would be delighted if someone could give me good reasons why I’m wrong.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags,
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
a whole bunch of things that are related to independent traits other than intelligence,
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
and a whole bunch of random irrelevant alleles that made it into your selection by random chance.
If they’re irrelevant, then there’s no problem.
This is a sure-fire way to make a therapy that has no chance of working,
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.
This is just measurement error and can be handled by normal psychometric approaches like SEM (eg. GSEM). You lose sample efficiency, but there’s no reason you can’t measure and correct for the measurement error. What the error does is render the estimates of each allele too small (closer to zero from either direction), but if you know how much error there is, you can just multiply back up to recover the real effect you would see if you had been able to use measurement with no error. In particular, for an editing approach, you don’t need to know the estimate at all—you only need to know that it is non-zero, because you are identifying the desired allele.
So, every measurement on every individual you get, whether it’s EA or SAT or GRE or parental degree or a 5-minute web quiz, helps you narrow down the set of 10,000 alleles that matters from the starting set of a few million. They just might not narrow it down much, so it becomes a more decision-theoretic question of how expensive is which data to collect and what maximizes your bang-for-buck. (Historically, the calculus has favored low-quality measurements which could be collected on a large number of people.)
The problem could potentially be solved by conducting GWASes that identify the SNPs of things known to correlate with the proxy measure other than intelligence and then subtracting those SNPs, but like you mention later in your reply, the question is what approach is faster and/or cheaper. Unless there is some magic I don’t know about with GSEM, I can’t see a convincing reason why it would have intelligence SNPs buoy to the top of lists ranked on the basis of effect size, especially with the sample size we would likely end up working with (<1 million). If you don’t know what SNPs contribute to intelligence versus something else, applying a flat factor to each allele’s effect size would just increase the scale of difference rather than help distill out intelligence SNPs. Considering the main limitation of this project is the number of edits they’re wanting to make, minimizing the number of allele flips while maximizing the effect on intelligence is one of the major goals here (although I’ve already stated why I think this project is infeasible). Another important thing to consider is that the p-value of SNPs’ effects would be attenuated as the number of independent traits affecting the phenotype increases; if you’re only able to get 500,000 data points for the GWAS that uses SAT as the phenotype, you will most likely have the majority of causal intelligence SNPs falling below the genome-wide significance threshold of p < 5 * 10E-8.
It’s also possible that optimizing peoples’ brains (or a group of embryos) for acing the SAT to the point where they have a 100% chance of achieving this brings us as close to a superintelligent human as we need until the next iteration of superintelligent human.
The tragedy of all of this is that it’s basically a money problem—if some billionaire could just unilaterally fund genome sequencing and IQ testing en masse and not get blocked by some government or other bureaucratic entity, all of this crap about building an accurate predictor would disappear and we’d only ever need to do this once.
More or less. If you have an impure measurement like ‘years of education’ which lumps in half intelligence and half other stuff (and you know this, even if you never have measurements of IQ and EDU and the other-stuff within individuals, because you can get precise genetic correlations from much smaller sample sizes where you compare PGSes & alternative methods like GCTA or cross-twin correlations), then you can correct the respective estimates of both intelligence and other-stuff, and you can pool with other GWASes on other traits/cohorts to estimate all of these simultaneously. This gets you estimates of each latent trait effect size per allele, and you just rank and select.
A statistical-significance threshold is irrelevant NHST mumbo-jumbo. What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold, whatever that may be, but which will have nothing at all to do with ‘genome-wide statistical significance’.
I’m aware of this, but if you’re just indiscriminately shoveling heaps of edits into someone’s genome based on a GWAS with too low a sample size to reveal causal SNPs for the desired trait, you’ll be editing a whole bunch of what are actually tags, a whole bunch of things that are related to independent traits other than intelligence, and a whole bunch of random irrelevant alleles that made it into your selection by random chance. This is a sure-fire way to make a therapy that has no chance of working, and if an indiscriminate shotgun approach like this is used in experiments, the combinatorics of the matter dictates that there are more possible sure-to-fail multiplex genome editing therapies than there are humans on the Earth, let alone those willing to be guinea pigs for an experiment like this. Having a statistical significance threshold imposes a bar to pass for SNPs that at least makes the therapy less of an ascertained suicide mission.
EDIT: misinterpreted what other party was saying.
What I said was “What you care about is posterior probability of the causal variant’s effect being above the cost-safety threshold”. If you are ‘indiscriminately shoveling’, then you apparently did it wrong.
Pretty much all SNPs are related to something or other. The question is what is the average effect. Given the known genetic correlations, if you pick the highest posterior probability ones for intelligence, then the average effect will be good.
(And in any case, one should be aiming for maximizing the gain across all traits as an index score.)
If they’re irrelevant, then there’s no problem.
No it’s not. If you’re using common SNPs which already exist, why would it ‘have no chance of working’? If some random SNP had some devastating effect on intelligence, then it would not be ranked high.